Gene transcriptions/Core promoters

A core promoter is that portion of the proximal promoter that contains the transcription start sites.

Biochemical definition: the minimal stretch of DNA sequence that is sufficient to direct accurate initiation of transcription. An acceptable range of the length of a core promoter is typically 60 to 120 base pairs.

Genomics definition: short sequences surrounding the transcription start sites (TSSs).

It contains a binding site for RNA polymerase (RNA polymerase I, RNA polymerase II, or RNA polymerase III) holoenzymes.

A vast network of regulatory factors that contribute to the initiation of transcription by RNA polymerase ultimately target any specific gene’s core promoter.

The core promoter includes the transcription start site(s) (TSS).

That portion of the core promoter that is upstream of the TSS is also part of the proximal promoter.

The core promoter is approximately -34 bp upstream from the TSS. "Several factors have been identified that bind to core promoters (reviewed in Smale, 1997)".

Genetics
Genetics involves the expression, transmission, and variation of inherited characteristics.

Gene transcriptions
DNA is a double helix of interlinked nucleotides surrounded by an epigenome. On the basis of biochemical signals, an enzyme, specifically a ribonucleic acid (RNA) polymerase, is chemically bonded to one of the strands (the template strand) of this double helix. The polymerase, once phosphorylated, begins to catalyze the formation of RNA using the template strand. Although the catalysis may have more than one beginning nucleotide (a start site) and more than one ending nucleotide (a stop site) along the DNA, each nucleotide sequence catalyzed that ultimately produces approximately the same RNA is part of a gene. The catalysis of each RNA representation from the template DNA is a transcription, specifically a gene transcription. The overall process is also referred to as gene transcription.

Promoters
Def. a "section of DNA that controls the initiation of RNA transcription as a product of a gene" is called a promoter.

Proximal promoters
Def a section of promoter DNA which includes the transcription start sites that is neighboring the start sites is called a proximal promoter.

Cores
Def. a central or most important part of something is called a core.

Theoretical core promoters
Def. "the factors, including RNA polymerase II itself, that are minimally essential for transcription in vitro from an isolated core promoter" is called the basal machinery, or basal transcription machinery.

Def. one or more sequence motifs containing the transcription start sites (TSSs), juxtaposed to the motif containing the TSSs, or in the proximal promoter that are only found in this core of motifs is called a core promoter.

Metal responsive elements
A metal responsive element (MRE), or TGC box, may occur in the core promoter of some human DNA genes.

"The metallothionein (MT) genes provide a good example of eucaryotic promoter architecture. MT genes specify the synthesis of low-molecular-weight metal-binding proteins. They are transcriptionally regulated by the metal ions cadmium and zinc (11), glucocorticoid hormones (18), interferon (14), interleukin-1 (22), and tumor promoters (2). The metal ion regulation of MTs is conferred by a short sequence element called the metal-responsive element (MRE [21]) or TGC box (31, 34), which functions as a metal ion-dependent enhancer."

GC boxes
Def. a "sequence of contiguous guanine, guanine, guanine, cytosine, and guanine, in that order, along a DNA strand" is called a GC box.

"[A] GC box is a distinct pattern of nucleotides found in the promoter region of some eukaryotic genes upstream of the TATA box and approximately 110 bases upstream from the transcription initiation site. It has a consensus sequence GGGCGG which is position dependent and orientation independent. The GC elements are bound by transcription factors and have similar functions to enhancers. "

"A large subclass of polymerase II promoters lacks both TATAA and CCAAT sequence motifs but contains multiple GC boxes. This promoter class includes several housekeeping genes (e.g., the genes encoding dihydrofolate reductase [DHFR] ..., hydroxymethylglutaryl coenzyme A reductase [39], hypoxanthine guanine phosphoribosyltransferase [33], and adenosine deaminase [46]) [and] nonhousekeeping genes (e.g., the transforming growth factor alpha [9, 23], rat malic enzyme [36], human c-Ha-ras [21], epidermal growth factor receptor [22], and nerve growth factor receptor [42] genes)."

"[A] GC box-binding factor is required for transcription and ... a truncated promoter containing one GC box is transcriptionally inactive (44). ... the DNA-protein interactions occurring at the GC boxes in the DHFR promoter are functionally distinct and that factors binding to the GC boxes must interact in a position-dependent manner."

"In promoters containing multiple GC boxes but lacking the TATAA box, transcription start sites may be single and specific, as observed in the nerve growth factor receptor gene (42) and the cellular retinol-binding protein gene (37), or there may be multiple heterogeneous start sites, such as those found in the c-myb (4), insulin receptor (45), and Ha-ras (21) genes. ... GC boxes are responsible for directing transcription from the major and the minor start sites. ... All TATAA-less promoters have at least two GC boxes".

"A GC box sequence, one of the most common regulatory DNA elements of eukaryotic genes, is recognized by the Spl transcription factor; its consensus sequence is represented as 5'-G/T G/A GGCG G/T G/A G/A C/T-3' [or 5′-KRGGCGKRRY-3′] (Briggs et al., 1986)."

HY boxes
A core responsive element is the hypertrophy region HY box between -89 and -60 nucleotides (nts) upstream from the transcription start site.

CAAT boxes
"[A] CCAAT box (also sometimes abbreviated a CAAT box or CAT box) is a distinct pattern of nucleotides with GGCCAATCT consensus sequence that occur upstream by 75-80 bases to the initial transcription site. The CAAT box signals the binding site for the RNA transcription factor, and is typically accompanied by a conserved consensus sequence. It is an invariant DNA sequence at about minus 70 base pairs from the origin of transcription in many eukaryotic promoters. Genes that have this element seem to require it for the gene to be transcribed in sufficient quantities. It is frequently absent from genes that encode proteins used in virtually all cells. This box along with the GC box is known for binding general transcription factors. CAAT and GC are primarily located in the region from 100-150bp upstream from the TATA box. Both of these consensus sequences belong to the regulatory promoter. Full gene expression occurs when transcription activator proteins bind to each module within the regulatory promoter. Protein specific binding is required for the CCAAT box activation. These proteins are known as CCAAT box binding proteins/CCAAT box binding factors. A CCAAT box is a feature frequently found before eukaryote coding regions".

B recognition elements
"The B recognition element (BRE) is a DNA sequence found in the promoter region of most genes in eukaryotes and Archaea. The BRE is a cis-regulatory element that is found immediately upstream of the TATA box, and consists of 7 nucleotides."

"The Transcription Factor IIB (TFIIB) recognizes this sequence in the DNA, and binds to it. The fourth and fifth alpha helices of TFIIB intercalate with the major groove of the DNA at the BRE. TFIIB is one part of the preinitiation complex that helps RNA Polymerase II bind to the DNA."

The consensus sequence is 5’-G/C G/C G/A C G C C-3’.

The general consensus sequence using degenerate nucleotides is 5’-SSRCGCC-3’, where S = G or C and R = A or G.

"The position in nucleotides (nt) relative to the transcription start site (TSS, +1)" is -35 for the BRE. Of human promoters, some "22-25% [are] BRE containing promoters ... the functional consensus sequences for BRE ... motif [is] still poorly defined."

EIF4E basal elements
The EIF4E basal element, also eIF4E, (4EBE) is a basal promoter element for the eukaryotic translation initiation factor 4E. "Interactions between 4EBE and upstream activator sites are position, distance, and sequence dependent."

TATA boxes
Def. a "DNA sequence (cis-regulatory element) found in the promoter region of genes in archaea and eukaryotes" is called a TATA box.

The TATA box can be an AT-rich sequence "located at a fixed distance upstream of the transcription start site".

TBP-like factors
Notation: let the symbol TLF designate a TATA binding protein-like factor.

The human gene TBPL1 (TBP-like 1, also TLF and TRF2 ), GeneID: 9519, encodes a protein that "does not bind to the TATA box and initiates transcription from TATA-less promoters."

Downstream TFIIB recognition
The downstream TFIIB recognition element (dBRE) has a consensus sequence in the transcription direction on the template strand of 3'-RTDKKKK-5', using degenerate nucleotides, or 3'-A/G-T-A/G/T-G/T-G/T-G/T-G/T-5'.

dBRE is cis-TATA box, between the TATA box and the Inr or transcription start site (TSS) and trans-TSS.

Initiator elements
For RNA polymerase II holoenzyme to transcribe a gene, the gene's promoter must be located. After the promoter is located, the transcription start site (TSS) is pinpointed by using nucleotide sequences that include the TSS or perhaps allow distance measurement to the TSS. Within the promoter, most human genes lack a TATA box and have an initiator element (Inr) or downstream promoter element instead.

"RNA pol II itself recognizes features of the Inr which might assist the correct positioning of the polymerase on the promoter (Carcamo et al., 1991; Weis and Reinberg, 1997)."

Transcription start sites
The transcription start site (TSS) is the location on the DNA template strand where transcription begins at the 3'-end of a gene. This location corresponds to the 5'-end of the mRNA which by convention is used to designate DNA locations. For example, the 5'-TATA-box-3' designation refers to the directionality of the mRNA and corresponds to the 3'-TATA-box-5' designation for nucleotides on the template strand. The template strand is the DNA strand being transcribed by RNA polymerase.

Downstream core elements
"[N]onredundant human promoter sequences 600 bp long (−499 to +100 bp around the TSS) [are available] from [the] Eukaryotic Promoter Database (EPD) release 75 (4, 68) (http://www.epd.isb-sib.ch/), and ... promoters sequences 1,200 bp long (−1,000 to +200 bp) [are available] from the Database of Transcriptional Start Sites (DBTSS) (59, 74, 75) (http://dbtss.hgc.jp/index.html)".

The downstream core element (DCE) is a transcription core promoter sequence that is within the transcribed portion of a gene.

The consensus sequence for the DCE is CTTC...CTGT...AGC. These three consensus elements are referred to as subelements: "SI is CTTC, SII is CTGT, and SIII is AGC."

The number of nucleotides between each subelement can apparently vary down to none.

A core promoter that contains all three subelements may be much less common than one containing only one or two. "SI resides approximately from +6 to +11, SII from +16 to +21, and SIII from +30 to +34."

SI as 3'-CTTC-5' can occur as 3 of 4 (CTT, TTC) or 4 of 4 (CTTC). SII as 3'-CTGT-5' can also occur as 3 of 4 (CTG, TGT) or 4 of 4 (CTGT). SIII as AGC is not known to vary.

DCE SIII can function independently of SI and SII.

Transcription factor II D (TFIID), a transcription factor that is part of the RNA polymerase II holoenzyme, interacts with promoters containing only SIII of the DCE suggesting a critical spacing parameter between SIII and the TATA box, initiator element, or some combination of the two. TFIID probably serves as a core promoter recognition complex.

TAF1 interacts with the DCE in a sequence-dependent manner.

The differences between core promoters with downstream elements may be explained by


 * 1) "TATA- and DPE-dependent promoters are specific for particular enhancers" ,
 * 2) "preferences of activators for specific core promoter architectures", and
 * 3) "the presence of a DCE or [downstream core promoter element (DPE)] might be indicative of an architecture designed for specific regulatory networks, such as the regulation of housekeeping promoters versus tissue-specific promoters (or other highly regulated promoters) or the regulation of subsets of viral promoters."

Motif ten elements
The motif ten element (MTE) is a downstream core promoter element that "promotes transcription by RNA polymerase II when it is located precisely at positions +18 to +27 relative to A+1 in the initiator (Inr) element."

The motif 10 consensus sequence is CSARCSSAACGS [5'-C-C/G-A-A/G-C-C/G-C/G-A-A-C-G-C/G-3']. By convention, the consensus sequence 5'-C-C/G-A-A/G-C-C/G-C/G-A-A-C-G-C/G-3' is stated as it would be translated into mRNA. In the direction of transcription on the template strand this consensus sequence becomes 3'-C-C/G-A-A/G-C-C/G-C/G-A-A-C-G-C/G-5'.

Downstream promoter elements
"The downstream promoter element (DPE) is a core promoter element ... present in other species including humans and excluding Saccharomyces cerevisiae. Like all core promoters, the DPE plays an important role in the initiation of gene transcription by RNA polymerase II."

The core sequence of the DPE is located precisely +28 to +32 nts relative to the A+1 nt in the Inr.

Hypotheses

 * 1) Each portion of a DNA that becomes active has a core promoter.
 * 2) The "minimal portion of the promoter required to properly initiate transcription".