Gene transcriptions/Boxes/C/Ds

Many small nucleolar RNAs fall into the family of C/D box snoRNAs.

For "box C/D snoRNAs, boxes C and D and an adjoining stem form a vital structure, known as the box C/D motif."

"The [C and D] box elements are essential for snoRNA production [transcription] and for snoRNA-directed modification of rRNA nucleotides."

The "motif is necessary and sufficient for nucleolar targeting, both in yeast and mammals. Moreover, in mammalian cells, RNA is targeted to coiled bodies as well. Thus, the box C/D motif is the first intranuclear RNA trafficking signal identified for an RNA family. Remarkably, it also couples snoRNA localization with synthesis and, most likely, function. The distribution of snoRNA precursors in mammalian cells suggests that this coupling is provided by a specific protein(s) which binds the box C/D motif during or rapidly after snoRNA transcription."

In snoRNA U73 on the right, the C box starting from the left side of the stem consists of nucleotides: ARUGAUGA, and from the right side the D box is AGUCY. In 5' to 3' direction, the D box is YCUGA.

Degenerate nucleotides
For transcription, U (in RNA) is T, Y=(C or T) and R=(A or G).

Consensus sequences
Shown in the image on the right are the C box (3'-AGUAGU-5') and the D box (3'-AGUCUG-5'). Substituting T for U yields C box = 3'-AGTAGT-5' and D box = 3'-AGTCTG-5' in the transcription direction on the template strand.

"Members of the box C/D snoRNA family, which are the subject of the present report, possess characteristic sequence elements known as box C (UGAUGA) and box D (GUCUGA)."

Gene transcriptions
"The model proposed [in the image on the right] is based on expression results obtained [...] with various box C/D snoRNAs in different cells and in different genetic contexts. The major features are predicted to be universal for all box C/D snoRNAs. Variations are likely to occur in the early steps, due to differences in genomic arrangements, i.e. introns of protein genes, or mono- or polycistronic snoRNA transcription units. Transcription occurs in the nucleoplasm. Folding of the precursor produces a functional box C/D motif. This motif is then recognized by a box C/D snoRNA family-specific binding protein(s). These events occur during (as depicted), or rapidly after transcription. A transcript then follows one of two possible pathways: (i) a precursor capped with monomethylguanosine (open circle) is hypermethylated to yield trimethylguanosine (closed circle) and trimmed at the 3' end by exonucleases, or (ii) if uncapped, the precursor is trimmed at both ends. In both cases, protein binding via the box C/D motif is then responsible for the delivery of the snoRNA to the nucleolus."

Synthesis to function link
"In addition to integrating the various aspects of snoRNA synthesis (maturation, stability and localization), there is the intriguing possibility that the box C/D motif also links snoRNA synthesis with function. In particular, this possibility applies to the snoRNAs which guide methylation of rRNA. The guide function is mediated by a long sequence complementary to the rRNA segment to be modified, and site selection depends on a box D or D-like element (CUGA, box D') located immediately downstream of the guide sequence. Methylation occurs in the complementary rRNA sequence, precisely five nucleotides from box D/D'. Some guide sequences occur near the 3' end, immediately upstream of the canonical box D, whereas others occur in the interior of the RNA adjoined to box D'. Some snoRNAs contain guide sequences in both arrangements. [...] snoRNAs with internal guide sequences and box D' also contain an additional box C-like sequence (UGAU) at a modest distance downstream of box D', suggesting that these RNAs have a second box C/D motif, in addition to the canonical one (Kiss-Laszlo et al., 1998; D.A. Samarsky, unpublished). Results from recent mutational studies have demonstrated that the novel box C-like element (called box C') is essential for the methylation reaction, but not for snoRNA accumulation (Kiss-Laszlo et al., 1998). [...] one or more proteins involved in the methylation function also bind directly or indirectly to the simple box C/D motif, and that snoRNA production and function may be connected at this level."

C-boxes
Analysis "of the recombinant (soybean [Glycine max] TGACG-motif binding factor 1) STF1 protein revealed the C-box (nGACGTCn) to be a high-affinity binding site (Cheong et al., 1998). The HY5 protein interacts with both the G- (CACGTG) and Z- (ATACGTGT) boxes of the light-regulated promoter of RbcS1A (ribulose bisphosphate carboxylase small subunit) and the CHS (chalcone synthase) genes (Ang et al., 1998; Chattopadhyay et al., 1998; Yadav et al., 2002). To test whether STF1 and HY5 have similar DNA-binding properties, the binding properties of each were compared with eight different DNA sequences that represent G-, C-, and C/G-box motifs [TGACGTGT]. C-box sequences carrying the mammalian cAMP responsive element (CRE; TGACGTCA) motif and the Hex sequence (TGACGTGGC), a hybrid C/G-box (Cheong et al., 1998), were high-affinity binding sites for both proteins [...]. No binding or limited binding was observed to as-1 (Lam et al., 1989), nos-1 (Lam et al., 1990), or the AP-1 site (TGACTCA; Kim et al., 1993). Binding to the palindromic G-box (PA G-box, GCCACGTGGC) was moderate. However, binding activity to the G-box of the light-responsive unit 1 (U1) region of the parsley (Petroselinum crispum) CHS promoter (CHS-U1: TCCACGTGGC; Schulze-Lefert et al., 1989) or the G-box of GmAux28 (TCCACGTGTC) was much weaker than to the PA G-box [...]."

The "binding affinities of both bZIP proteins were similar to CREA/T (ATGACGTCAT), a CRE sequence with flanking adenine and thymine (A/T) at positions -4 and +4. [The] bZIP domains of both STF1 and HY5 have similar binding properties for recognizing ACGT-containing elements (ACEs). [Although] the G-box is a known target site for the HY5 protein, the C-box sequences are the preferred binding sites for both STF1 and HY5."

"When analyzed by type of ACE, these sequences can be grouped into four subclasses [...]: C-box, where the C residue comes at the 12 position; a hybrid C/G- box (C/G-box), with G at the 12 position; C/A-box [TGACGTAT], with A at the 12 position; and C/T-box, with T at the 12 position. The C-box subclass contains the largest number of selected binding sites for STF1 (38% at 50 mM KCl and 48% at 150 mM), followed by the C/G- (25.3%) and the C/A-boxes (26%). Only a small number of C/T-boxes [TGACGTTA] (4/100) and non-TGACGT sequences (4/100) were selected."

C-boxes are TCTTACGTCATC, AATGACGTCGAA, TCTCACGTGTGG, TTTGACGTGTGA, GATGACGTCATC, and AGAGACGTCAAC for an apparent consensus sequence of (A/G/T)(A/C/G/T)(A/T)(C/G/T)ACGT(C/G)(A/G/T)(A/G/T)(A/C/G).

C/A-boxes are GGTTACGTCAAT, TTTGACGTATTT, TTTGACGTAAAC, AAAGACGTAAAC, TACTACGTCAGA, CGTGACGTAACC, GACTACGTCGAC, TAATACGTCATG, CTTGACGTATAC, CATTACGTCATT, GTTGACGTAAAG, ACTGACGTAAAG, TTCGACGTAGAT, GAAGACGTAGAA, and GATTACGTCAGC for an apparent consensus sequence of (A/C/G/T)(A/C/G/T)(A/C/T)(C/G/T)ACGT(A/C/G)(A/G/T)(A/C/G/T)(A/C/G/T).

C/G-boxes are TGCCACGTCAGA, ATAGACGTGTCC, CAACACGTCTGC, CGTGACGTGGGA, GGACACGTCTAG, AGCGACGTGGAT, TGAGACGTGTTT, GTCCACGTCTTT, TGACACGTCATC, TTTCACGTCTAT, TATGACGTGATC, CAAGACGTGTGG, AAACACGTCACA, and ACTGACGTGGAA for an apparent consensus sequence of (A/C/G/T)(A/C/G/T)(A/C/T)(C/G)ACGT(C/G)(A/G/T)(A/C/G/T)(A/C/G/T).

G-boxes
G-boxes are AGACACGTGTGG, AATCACGTGGCG, TCACACGTGTCA, TGCCACGTGTCC, GTACACGTGTTA, TAACACGTGAAA, TAACACGTGTTC, GCCCACGTGTCA, GGCCACGTGTTG, ATCCACGTGTCT, CTCCACGTGTCG, TGACACGTGTTT, TGACACGTGTAT, TGCCACGTGGGT, TCACACGTGTGA, TTTCACGTGATT, TCCCACGTGGCA, ACACACGTGTTC, CACCACGTGAAC, ACACACGTGTGA, TCCCACGTGAAT, ATCCACGTGGCG, AATCACGTGTAT, ATCCACGTGACT, CAACACGTGTCA, and ACCCACGTGTAA for an apparent consensus sequence of (A/C/G/T)(A/C/G/T)(A/C/T)CACGTG(A/G/T)(A/C/G/T)(A/C/G/T).

G/A-boxes are CTACACGTAAAC, GGTCACGTATTT, AAACACGTATAT, CGACACGTAGTT, GATTACGTGTGC, TTACACGTATAA, ATATACGTGTGT, TGTCACGTAGGC, TGTTACGTGTAG, TACCACGTAACT, CGATACGTGGTC, TGATACGTGAAG, ATACACGTATGT, GTCCACGTAGAC, AGACACGTAAAA, CCATACGTGGGC, TTCCACGTATCA, AGCTACGTGATC, AATCACGTAGAG, AGCTACGTGACG, TAACACGTATTG, ACATACGTGTGT, TGACACGTAGAT, TATCACGTAATA, GTCCACGTAGGT, AGACACGTATCG, and AGCCACGTAATA for an apparent consensus sequence of (A/C/G/T)(A/C/G/T)(A/C/T)(C/T)ACGT(A/G)(A/G/T)(A/C/G/T)(A/C/G/T).

cAMP response elements
The "binding affinities of both bZIP proteins were similar to CREA/T (ATGACGTCAT), a CRE sequence with flanking adenine and thymine (A/T) at positions -4 and +4."

A-boxes
A-boxes are TTTTACGTAAGA, GCATACGTAGAG, GATTACGTATGA, ATATACGTAGAT, AATTACGTATAC, ATATACGTAATT, GCATACGTAATG, ACATACGTATTT, TTATACGTAATC, TTCTACGTAAAA, TAATACGTATGC, ACCTACGTATAT, TGTTACGTAAAA, TTCTACGTAGAT, ATTTACGTATTA, ATTTACGTATAA, AAATACGTAATG, AAATACGTAATC, and TATTACGTATAG for an apparent consensus sequence of (A/G/T)(A/C/G/T)(A/C/T)TACGTA(A/G/T)(A/G/T)(A/C/G/T).

Z-boxes
"The HY5 protein interacts with both the G- (CACGTG) and Z- (ATACGTGT) boxes of the light-regulated promoter of RbcS1A (ribulose bisphosphate carboxylase small subunit) and the CHS (chalcone synthase) genes (Ang et al., 1998; Chattopadhyay et al., 1998; Yadav et al., 2002)."