<!-- NOTE: image help requested because image too large for side placement -->
A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and viruses.
Description
<!-- NOTE: The gene structure image templates below are in a small 1x1 table so that they scroll properly on mobile and tablet screens (as of 2015) -->thumb|380x380px|The structure of a [[eukaryotic protein-coding . Regulatory sequence controls when and where expression occurs for the protein coding region (red). Promoter and enhancer regions (yellow) regulate the transcription of the gene into a pre-mRNA which is modified to remove introns (light grey) and add a 5' cap and poly-A tail (dark grey). The mRNA 5' and 3' untranslated regions (blue) regulate translation into the final protein product.]]
In DNA, regulation of gene expression normally happens at the level of RNA biosynthesis (transcription). It is accomplished through the sequence-specific binding of proteins (transcription factors) that activate or inhibit transcription. Transcription factors may act as activators, repressors, or both. Repressors often act by preventing RNA polymerase from forming a productive complex with the transcriptional initiation region (promoter), while activators facilitate formation of a productive complex. Furthermore, DNA motifs have been shown to be predictive of epigenomic modifications, suggesting that transcription factors play a role in regulating the epigenome.
thumb|384x384px|The structure of a [[prokaryotic operon of protein-coding genes. Regulatory sequence controls when expression occurs for the multiple protein coding regions (red). Promoter, operator and enhancer regions (yellow) regulate the transcription of the gene into an mRNA. The mRNA untranslated regions (blue) regulate translation into the final protein products.]]
In RNA, regulation may occur at the level of protein biosynthesis (translation), RNA cleavage, RNA splicing, or transcriptional termination. Regulatory sequences are frequently associated with messenger RNA (mRNA) molecules, where they are used to control mRNA biogenesis or translation. A variety of biological molecules may bind to the RNA to accomplish this regulation, including proteins (e.g., translational repressors and splicing factors), other RNA molecules (e.g., miRNA) and small molecules, in the case of riboswitches.
Activation and implementation
A regulatory DNA sequence does not regulate unless it is activated. Different regulatory sequences are activated and then implement their regulation by different mechanisms.
Enhancer activation and implementation
Expression of genes in mammals can be upregulated when signals are transmitted to the promoters associated with the genes. Cis-regulatory DNA sequences that are located in DNA regions distant from the promoters of genes can have very large effects on gene expression, with some genes undergoing up to 100-fold increased expression due to such a cis-regulatory sequence. These cis-regulatory sequences include enhancers, silencers, insulators and tethering elements. Among this constellation of sequences, enhancers and their associated transcription factor proteins have a leading role in the regulation of gene expression.
Enhancers are sequences of the genome that are major gene-regulatory elements. Enhancers control cell-type-specific gene expression programs, most often by looping through long distances to come in physical proximity with the promoters of their target genes. In a study of brain cortical neurons, 24,937 loops were found, bringing enhancers to promoters.) generally bind to specific motifs on an enhancer and a small combination of these enhancer-bound transcription factors, when brought close to a promoter by a DNA loop, govern the level of transcription of the target gene. Mediator (coactivator) (a complex usually consisting of about 26 proteins in an interacting structure) communicates regulatory signals from enhancer DNA-bound transcription factors directly to the RNA polymerase II (RNAP II) enzyme bound to the promoter.
Enhancers, when active, are generally transcribed from both strands of DNA with RNA polymerases acting in two different directions, producing two eRNAs as illustrated in the Figure. An inactive enhancer may be bound by an inactive transcription factor. Phosphorylation of the transcription factor may activate it and that activated transcription factor may then activate the enhancer to which it is bound (see small red star representing phosphorylation of a transcription factor bound to an enhancer in the illustration). An activated enhancer begins transcription of its RNA before activating a promoter to initiate transcription of messenger RNA from its target gene.
Transcription factor binding sites within enhancers (see figure above) are usually about 10 base pairs long, though they can vary from just a few to about 20 base pairs. Enhancers usually have about 10 transcription factor binding sites within an average enhancer site of about 204 base pairs. Examining enhancer-gene regulatory interactions occurring in 352 cell types and tissues, more than 13 million active enhancers were found.
Super-enhancer
thumb|435x435px|A super-enhancer is a cluster of typical enhancers that drives a high level of transcription of a target gene
While enhancers are needed for transcription of genes in a cell above low levels, a cluster of enhancers, known as a super-enhancer, can cause transcription of a target gene at even higher levels. Super-enhancers usually drive genes needed for cell identity to express at high levels. In cancers, a super-enhancer may also drive a particular oncogene to express at a high level. base pairs in length) that, all together, regulate the expression of a target gene. Super-enhancer-driven genes are expressed at significantly higher levels than the expression of genes under the control of typical enhancers. In addition, the architectural protein YY1 (indicated by paired red zigzags) helps keep the loops together that bring the typical enhancers to their target gene in the super-enhancer. Therefore, there are many proteins in close association at a super-enhancer. These proteins generally have a structured domain as well as a tail with an intrinsically disordered region (IDR). Many of the IDRs of these proteins interact with each other, thereby forming a water-excluding gel or phase-separated condensate around the super-enhancer. and the Wap super-enhancer. The mouse α-globin super-enhancer has five typical enhancers within the super-enhancer. Only when acting together, they increase transcription of the α-globin gene by 450-fold.
Super-enhancers may occupy regions of the genome about 10,000 to 60,000 nucleotides long. while typical enhancers are each about 204 base pairs long.
While super-enhancers are only active at about 2.5% – 10.9% of actively transcribed sites in a cell, they recruit transcription machinery more actively than at typical single enhancers. The super-enhancers in a cell utilize about 12% to 36% of the RNA polymerases, mediator proteins, BRD4 proteins, and other transcription machinery of the cell. In most tissues of mammals, on average, 70% to 80% of CpG cytosines are methylated (forming 5-methyl-CpG, or 5-mCpG). Methylated cytosines within CpG sequences often occur in groups, called CpG islands. About 59% of promoter sequences have a CpG island while only about 6% of enhancer sequences have a CpG island. CpG islands constitute regulatory sequences, since if CpG islands are methylated in the promoter of a gene this can reduce or silence gene expression.
DNA methylation regulates gene expression through interaction with methyl binding domain (MBD) proteins, such as MeCP2, MBD1 and MBD2. These MBD proteins bind most strongly to highly methylated CpG islands. These MBD proteins have both a methyl-CpG-binding domain and a transcriptional repression domain. About 94% of transcription factor binding sites that are associated with signal-responsive genes occur in enhancers while only about 6% of such sites occur in promoters. There are about 12,000 binding sites for EGR1 in the mammalian genome and about half of EGR1 binding sites are located in promoters and half in enhancers. Expression of EGR1 in various types of cells can be stimulated by growth factors, neurotransmitters, hormones, stress and injury. The induction of particular double-strand breaks is specific with respect to the inducing signal. When neurons are activated in vitro, just 22 TOP2B-induced double-strand breaks occur in their genomes. However, when contextual fear conditioning is carried out in a mouse, this conditioning causes hundreds of gene-associated DSBs in the medial prefrontal cortex and hippocampus, which are important for learning and memory.
thumb|500px|Regulatory sequence in a promoter at a transcription start site with a paused RNA polymerase and a TOP2B-induced double-strand break
Such TOP2B-induced double-strand breaks are accompanied by at least four enzymes of the non-homologous end joining (NHEJ) DNA repair pathway (DNA-PKcs, KU70, KU80 and DNA LIGASE IV) (see figure). These enzymes repair the double-strand breaks within about 15 minutes to 2 hours. The double-strand breaks in the promoter are thus associated with TOP2B and at least these four repair enzymes. These proteins are present simultaneously on a single promoter nucleosome (there are about 147 nucleotides in the DNA sequence wrapped around a single nucleosome) located near the transcription start site of their target gene. TOP1 causes single-strand breaks in particular enhancer DNA regulatory sequences when signaled by a specific enhancer-binding transcription factor. Conserved non-coding sequences often contain regulatory regions, and so they are often the subject of these analyses.
- CAAT box
- CCAAT box
- Operator (biology)
- Pribnow box
- TATA box
- SECIS element, mRNA
- Polyadenylation signal, mRNA
- A-box
- Z-box
- C-box
- E-box
- G-box
Insulin gene
Regulatory sequences for the insulin gene are:
- A5
- Z
- negative regulatory element (NRE)
- C2
- E2
- A3
- cAMP response element
- A2
- CAAT enhancer binding (CEB)
- C1
- E1
- G1
See also
- Regulator gene
- Regulation of gene expression
- Cis-acting element
- Gene regulatory network
- Open Regulatory Annotation Database
- Operon
- DNA binding site
- Promoter
- Trans-acting factor
- ORegAnno
References
External links
- ORegAnno - Open Regulatory Annotation Database
- ReMap - database of transcriptional regulators
