Transcription is the process of duplicating a segment of DNA into RNA for the purpose of gene expression. Some segments of DNA are transcribed into RNA molecules that can encode proteins, called messenger RNA (mRNA). Other segments of DNA are transcribed into RNA molecules called non-coding RNAs (ncRNAs).
Both DNA and RNA are nucleic acids, composed of nucleotide sequences. During transcription, a DNA sequence is read by an RNA polymerase, which produces a complementary RNA strand called a primary transcript.
In virology, the term transcription is used when referring to mRNA synthesis from a viral RNA molecule. The genome of many RNA viruses is composed of negative-sense RNA which acts as a template for positive sense viral messenger RNA - a necessary step in the synthesis of viral proteins needed for viral replication. This process is catalyzed by a viral RNA dependent RNA polymerase.
Background
A DNA transcription unit encoding for a protein may contain both a coding sequence, which will be translated into the protein, and regulatory sequences, which direct and regulate the synthesis of that protein. The regulatory sequence before (upstream from) the coding sequence is called the five prime untranslated regions (5'UTR); the sequence after (downstream from) the coding sequence is called the three prime untranslated regions (3'UTR).
As opposed to DNA replication, transcription results in an RNA complement that includes the nucleotide uracil (U) in all instances where thymine (T) would have occurred in a DNA complement.
Only one of the two DNA strands serves as a template for transcription. The antisense strand of DNA is read by RNA polymerase from the 3' end to the 5' end during transcription (3' → 5'). The complementary RNA is created in the opposite direction, in the 5' → 3' direction, matching the sequence of the sense strand except switching uracil for thymine. This directionality is because RNA polymerase can only add nucleotides to the 3' end of the growing mRNA chain. This use of only the 3' → 5' DNA strand eliminates the need for the Okazaki fragments that are seen in DNA replication.
Transcription has some proofreading mechanisms, but they are fewer and less effective than the controls for copying DNA. As a result, transcription has a lower copying fidelity than DNA replication. <!-- page 788 in 5th edition -->
Major steps
Transcription is divided into initiation, promoter escape, elongation, and termination.
Setting up for transcription
Enhancers, transcription factors, Mediator complex, and DNA loops in mammalian transcription
thumb|right|upright=2| Regulation of transcription in mammals. This illustration indicates many of the elements that are present when transcription of a gene is upregulated.
Setting up for transcription in mammals is regulated by many cis-regulatory elements, including core promoter and promoter-proximal elements that are located near the transcription start sites of genes. Core promoters combined with general transcription factors are sufficient to direct transcription initiation, but generally have low basal activity. Other important cis-regulatory modules are localized in DNA regions that are distant from the transcription start sites. These include enhancers, silencers, insulators and tethering elements. Among this constellation of elements, enhancers and their associated transcription factors have a leading role in the initiation of gene transcription. An enhancer localized in a DNA region distant from the promoter of a gene can have a very large effect on gene transcription, with some genes undergoing up to 100-fold increased transcription due to an activated enhancer.
Enhancers are regions of the genome that are major gene-regulatory elements. Enhancers control cell-type-specific gene transcription programs, most often by looping through long distances to come in physical proximity with the promoters of their target genes. While there are hundreds of thousands of enhancer DNA regions, for a particular type of tissue only specific enhancers are brought into proximity with the promoters that they regulate. In a study of brain cortical neurons, 24,937 loops were found, bringing enhancers to their target promoters. Several cell function specific transcription factors (there are about 1,600 transcription factors in a human cell) generally bind to specific motifs on an enhancer and a small combination of these enhancer-bound transcription factors, when brought close to a promoter by a DNA loop, govern level of transcription of the target gene. Mediator (a complex usually consisting of about 26 proteins in an interacting structure) communicates regulatory signals from enhancer DNA-bound transcription factors directly to the RNA polymerase II (pol II) enzyme bound to the promoter.
Enhancers, when active, are generally transcribed from both strands of DNA with RNA polymerases acting in two different directions, producing two enhancer RNAs (eRNAs) as illustrated in the Figure. An inactive enhancer may be bound by an inactive transcription factor. Phosphorylation of the transcription factor may activate it and that activated transcription factor may then activate the enhancer to which it is bound (see small red star representing phosphorylation of transcription factor bound to enhancer in the illustration). An activated enhancer begins transcription of its RNA before activating transcription of messenger RNA from its target gene.
CpG island methylation and demethylation
thumb|300px|This shows where the methyl group is added when 5-methylcytosine is formed
Transcription regulation at about 60% of promoters is also controlled by methylation of cytosines within CpG dinucleotides (where 5' cytosine is followed by 3' guanine or CpG sites). 5-methylcytosine (5-mC) is a methylated form of the DNA base cytosine (see Figure). 5-mC is an epigenetic marker found predominantly within CpG sites. About 28 million CpG dinucleotides occur in the human genome. In most tissues of mammals, on average, 70% to 80% of CpG cytosines are methylated (forming 5-methylCpG or 5-mCpG). However, unmethylated cytosines within 5'cytosine-guanine 3' sequences often occur in groups, called CpG islands, at active promoters. About 60% of promoter sequences have a CpG island while only about 6% of enhancer sequences have a CpG island. CpG islands constitute regulatory sequences, since if CpG islands are methylated in the promoter of a gene this can reduce or silence gene transcription.
DNA methylation regulates gene transcription through interaction with methyl binding domain (MBD) proteins, such as MeCP2, MBD1 and MBD2. These MBD proteins bind most strongly to highly methylated CpG islands. These MBD proteins have both a methyl-CpG-binding domain as well as a transcription repression domain. About 94% of transcription factor binding sites (TFBSs) that are associated with signal-responsive genes occur in enhancers while only about 6% of such TFBSs occur in promoters. There are about 12,000 binding sites for EGR1 in the mammalian genome and about half of EGR1 binding sites are located in promoters and half in enhancers. Production of EGR1 transcription factor proteins, in various types of cells, can be stimulated by growth factors, neurotransmitters, hormones, stress and injury.
The splice isoform DNMT3A2 behaves like the product of a classical immediate-early gene and, for instance, it is robustly and transiently produced after neuronal activation. Where the DNA methyltransferase isoform DNMT3A2 binds and adds methyl groups to cytosines appears to be determined by histone post translational modifications.
On the other hand, neural activation causes degradation of DNMT3A1 accompanied by reduced methylation of at least one evaluated targeted promoter.
