Gene transcriptions/Boxes/CAATs

A "CCAAT box (also sometimes abbreviated a CAAT box or CAT box) is a distinct pattern of nucleotides" along the template strand of DNA in eukaryotes.

Boxes
A "repeating sequence of nucleotides that forms a transcription or a regulatory signal" is a box.

Consensus sequences
In the direction of transcription on the template strand, the consensus sequence for a CAAT box is 3'-GGCCAATCT-5'.

On the coding strand "(T/C)G ATTGG (T/C)(T/C)(A/G) was the sequence that favored CBF binding [in the mouse pro-α2(1) collagen promoter]." On the template strand, this is 3'-(C/T)(A/G)(A/G)CCAATC(A/G)-5'. "[T]he favorable sequence for CBF binding was TG ATTGG (T/C)(T/C)(A/G)."

Core promoters
Notation: let the symbol CBF represent the CAAT-box binding factor.

A CAAT box when present occurs "upstream by 75-80 bases to the initial transcription site."

"In many eukaryotic class II promoters, CCAAT motifs are often found between 50 and 100 nucleotides upstream of the transcription start site (17-20), and these motifs are recognized by different classes of CCAAT-binding proteins, one of which is CBF."

"In many higher eukaryotic class II promoters, CCAAT motifs (or ATTGG motifs in the opposite strand), are often found between −50 and −110 relative to the start of transcription (1-4). The precise location of these CCAAT motifs and the promoter sequences around the motif of a specific gene are highly conserved during evolution."

"In metazoa, the CBF-DNA complex is characterized by its requirement for a high degree of conservation within the binding motif CCAAT (7, 21, 22), and sequences surrounding the pentameric motif contribute to the binding specificity (Ref. 16 and references therein)."

"Computer analysis of 502 unrelated RNA polymerase II promoter regions showed that approximately 30% of the promoters contained a CCAAT sequence (or ATTGG sequence on the complementary strand) and that in a large number of vertebrate promoters the CCAAT motif was located around nucleotide −80 upstream of the transcription start site (4)."

"[I]n most of these promoters the flanking sequences of ATTGG were TG on the 5′ side and (T/C)(T/C)(A/G) on the 3′ side".

"[T]he CCAAT-flanking sequences [occur] around the CCAAT motifs in most eukaryotic promoters harboring a CCAAT sequence in these proximal promoters."

"In contrast to many animal CCAAT motifs, the majority of the plant sequences contain only one C or lack a CAAT-box completely."

Gene transcriptions
"Genes that have this element seem to require it for the gene to be transcribed in sufficient quantities. It is frequently absent from genes that encode proteins used in virtually all cells. This box along with the GC box is known for binding general transcription factors. CAAT and GC are primarily located in the region from 100-150bp upstream from the TATA box. Both of these consensus sequences belong to the regulatory promoter. Full gene expression occurs when transcription activator proteins bind to each module within the regulatory promoter. Protein specific binding is required for the CCAAT box activation. These proteins are known as CCAAT box binding proteins/CCAAT box binding factors."

Cadherins
"Transcriptional downregulation of E-cadherin appears to be an important event in the progression of various epithelial tumors. SIP1 (ZEB-2) is a Smad-interacting, multi-zinc finger protein that shows specific DNA binding activity. [Expression] of wild-type but not of mutated SIP1 downregulates mammalian E-cadherin transcription via binding to both conserved E2 boxes of the minimal E-cadherin promoter."

"Analysis of mouse and human E-cadherin promoters revealed a conserved modular structure with positive regulatory elements including two E2 boxes (CACCTG) with a potential repressor role Behrens et al. 1991, Giroldi et al. 1997."

"The two E2 boxes in the mouse and human E-cadherin promoter sequences were demonstrated to play a crucial role in the epithelial-specific expression of E-cadherin Behrens et al. 1991, Giroldi et al. 1997. Mutation of these sequence elements results in upregulation of the E-cadherin promoter in dedifferentiated cancer cells, whereas the wild-type promoter shows low activity in such cells. Recently, it was shown that the zinc finger transcriptional repressor Snail can downregulate E-cadherin by binding to the E boxes in the E-cadherin promoter Batlle et al. 2000, Cano et al. 2000. Human Snail belongs to a family of zinc finger proteins, which contain four or five zinc finger domains of the C2H2 type at their C-terminal end. These zinc fingers bind to the CANNTG sequence in E box motifs."

"δEF1 and SIP1 have been shown to bind spaced CACCT DNA sequences, including E2 boxes (CACCTG), by their zinc finger clusters (Remacle et al., 1999)."

"To address the specificity of SIP1 action, mutagenesis of the E-cadherin promoter in either its upstream E2 box 1 (−75) or its downstream E2 box 3 (−25), or in both E2 boxes was performed [...]."

Wild-type "SIP1 represses the E-cadherin promoter, likely through binding via both zinc finger clusters to spaced E2 boxes as demonstrated previously (Remacle et al., 1999) and confirmed here by a DNA-mediated pull-down assay of SIP1 protein [...]. Wild-type but not mutated SIP1 from transfected human cells could be efficiently precipitated by biotinylated E-cadherin promoter oligonucleotides, comprising two wild-type E2 box sequences. Mutation of the E2 boxes resulted in the loss of SIP1 binding."

Human E2 boxes are E2-box 1 (GCAGGTGA), E2-box 2 (TGGCCGGC) and E2-box 3 (TCACCTGG).

"Alignment of the E-cadherin promoter sequences of dog, mouse, and man. Conserved regulatory elements are indicated: E2 boxes 1 and 3, CCAAT box, and GC box. The E2 box 2 has been described as part of a palindromic E-pal sequence in the mouse E-cadherin promoter (Behrens et al., 1991), but is conserved neither in canine nor in human sequences."

Human NeuroD (BETA2/BHF1) genes
"There was no consensus CAAT box. [...] In addition, we performed mutation analyses of the E2 box and the E3 box to evaluate whether the E2 and E3 boxes regulate the transcriptional activity of the human NeuroD gene [...]."

Human glucocerebrosidase genes
The "5′ genomic sequences revealed promoter elements containing a TATA box at nucleotides −23 to −27 and a CAAT box between nucleotides [...] and an E2 box [...]."

Cap signal elements
"Studies have reported that the cap signal element with the TATA-box, CAAT-box, and GC-box is the most general element of the POL II promoter and exists in major protein [...]."

Hypotheses

 * 1) A1BG is not transcribed by a CAAT box.

A1BG samplings
A CCAAT box (also sometimes abbreviated a CAAT box or CAT box) is a distinct pattern of nucleotides along the template strand of DNA in eukaryotes.

On the template strand, the CAAT box consensus sequence is 3'-(C/T)(A/G)(A/G)CCAATC(A/G)-5'.

For the Basic programs (starting with SuccessablesCAAT.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
 * 1) CAAT - 3'-(C/T)(A/G)(A/G)CCAATC(A/G)-5', -- there are zero, -+ there are zero, +- there are zero, ++ there are zero.
 * 2) CAAT - 3'-(A/G)(C/T)(C/T)GGTTAG(C/T)-5', complement, -- there are zero, -+ there are zero, +- there are zero, and ++ there are zero.
 * 3) CAAT - 3'-(A/G)-C-T-A-A-C-C-(A/G)-(A/G)-(C/T)-5', inverse, -- there are zero, -+ there are zero, +- there are zero, and ++ there are zero.
 * 4) CAAT - 3'-(C/T)-G-A-T-T-G-G-(C/T)-(C/T)-(A/G)-5', complement inverse, -- there are zero, -+ there are zero, +- there are zero, and ++ there are zero.

With each SuccessablesCAAT.bas extended from 958 to 4445 nts starting just beyond ZNF497, there are no changes in results.

No CAAT boxes occur on either side of A1BG.