[[File:CCAAT DNA4.jpg|thumb|upright=2|The left model is of the complex of NF-YC/NF-YB with
the CCAAT element from the pro- 2(I) collagen promoter. The DNA backbone is shown as ribbons (purple) with the bases displayed. The two
possible locations of the CCAAT box, according to the modeling, have been colored cyan. For the right model of the NF-Y/CCAAT complex. NF-YC, NF-YB
and DNA are colored as in figure on the left, whereas NF-YA is colored blue. The two alternative positions for the linker connecting NF-YA1 and NF-YA2
sub-domains are shown as blue dotted lines. Secondary structure elements of the histone pair that are implicated in NF-YA1 and NF-YA2
recognition (see text) are labeled and colored in red and gray, respectively. For clarity, only the bases for the CCAAT pentanucleotide are shown
and labeled.]]
In molecular biology, a CCAAT box (also sometimes abbreviated a CAAT box or CAT box) is a distinct pattern of nucleotides with GGCCAATCT consensus sequence that occur upstream by 60–100 bases to the initial transcription site. The CAAT box signals the binding site for the RNA transcription factor, and is typically accompanied by a conserved consensus sequence. It is an invariant DNA sequence at about minus 70 base pairs from the origin of transcription in many eukaryotic promoters. Genes that have this element seem to require it for the gene to be transcribed in sufficient quantities. It is frequently absent from genes that encode proteins used in virtually all cells. This box along with the GC box is known for binding general transcription factors. Both of these consensus sequences belong to the regulatory promoter. Full gene expression occurs when transcription activator proteins bind to each module within the regulatory promoter. Protein specific binding is required for the CCAAT box activation. These proteins are known as CCAAT box binding proteins/CCAAT box binding factors.
A CCAAT box is a feature frequently found before eukaryote coding regions, but is not found in prokaryotes.
Consensus sequence
In the direction of transcription of the template strand, the consensus sequence, or the calculated order of the most frequent residues, for the CAAT box was 3'-TG ATTGG (T/C)(T/C)(A/G)-5'. The use of parentheses denotes that either base is present, but it is not specified as to their relative frequencies. For example, "(T/C)" would mean that either thymine or cytosine are preferentially selected for. Within metazoa (animal kingdom), the core binding factor (CBF)-DNA complex retains a high degree of conservation within the CCAAT binding motif, as well as the sequences flanking this pentameric motif. The CCAAT motif in plants (spinach was used in an experiment) differs slightly from metazoa in that it is actually a CAAT binding motif; the promoter lacks one of the two C residues from the pentameric motif, and the artificial addition of the second C has no significant effects on binding activity. Some sequences lack the CAAT-box completely. Secondly, the surrounding nucleotides in plants do not match the consensus sequence above determined by Bi et al.
Core promoter
The CAAT box is what is known as a core promoter, also known as the basal promoter or simply the promoter, is a region of DNA that initiates transcription of a particular gene. This region, in particular for the CAAT box, is located about 60–100 bases upstream (towards the 5' end), however no less than 27 base pairs away, from the initial transcription site or a eukaryote gene in which a complex of general transcription factors bind with RNA polymerase II prior to the initiation of transcription. It is essential to the transcription that these core binding factors (also referred to as nuclear factor Y or NF-Y) are able to bind to the CCAAT motif. Experiments in many laboratories have shown that mutations to the CCAAT motif that cause a loss of CBF binding also decreases transcriptional activity in these promoters, suggesting that CBF-CCAAT complexes are essential for optimum transcriptional activity. This was shown using an oligonucleotide sequence (R1) which contained 27 random nucleotides, flanked by a defined 20 nucleotide sequence on each side. While no single nucleotide was selected in every clone on either side of the ATTGG motif (CCAAT in the complementary strand), there were several nucleotides in positions selected with high frequency. Most notably from the sequence above was the G residue towards the 5' end of the ATTGG. The other residues also listed were notable, but there is a split between two residues. This same experiment also yielded the same sequence as shown above when using a different oligonucleotide (R2) that contained an ATTGG core and flanked by 12 5' random nucleotides and 10 3' random nucleotides. Both these sequences are very similar and confirmed in multiple experiments. For sequences that flanked the ATTGG motif with two adenine residues (AA) on its 5' end and G(A/G) on its 3' end, seems to have inhibited formation of the CBF-DNA complex and subsequently occurred in only 1% of the promoter sequences.
CCAAT in plants
These core binding factors, or nuclear factors (NF-Y), are composed of three subunits – NF-YA, NF-YB, and NF-YC. Whereas in animals each NF-Y subunit is encoded by a single gene, there has been a diversification in plants in both structure and function. Families of NF-Y consist of between eight and 39 members per subunit. A large reason for this diversification is because of gene duplications and tandem duplications, which have helped contribute to the larger family sizes of NF-Y compared to the single encoded animal nuclear factors. Each subunit contains an evolutionarily conserved part – the C-terminal of NF-YA, the central part of NF-YB, and the N-terminal of NF-YC, greater than 70% of these across species remains conserved. Neighboring regions however are generally not conserved. For example, in adipocytes, this has been shown in a variety of experiments with mice: ectopic expression of these C/EBPs (C/EBPα and C/EBPβ) were able to initiate the differentiation programs of the cell, even in the absence of adipogenic hormones, or the differentiation of preadipocytes to adipocytes (or fat cells). In addition, an overabundance of these C/EBPs (specifically, C/EBPδ) causes an accelerated response. And furthermore, in cells lacking C/EBP or in C/EBP-deficient mice, both are unable to undergo adipogenesis. This results in the mice dying from hypoglycemia, or the reduced lipid accumulation in adipose tissue. The C/EBPs follow a general basic-leucine zipper (bZIP) domain at the C-terminus and are able to form dimers with other C/EBPs or other transcription factors. This dimerization allows the C/EBPs to bind specifically to DNA through a palindromic sequence in the major groove of DNA. They are regulated through various means, including hormones, mitogens, cytokines, nutrients, and other various factors.
