Gene transcriptions/Elements/Initiators

In the biosynthesis of any human protein, the gene that contains the nucleotide sequence which is translated into that protein must be transcribed. For RNA polymerase II holoenzyme to transcribe the gene, the gene's promoter must be located. After the promoter is located, the transcription start site (TSS) is pinpointed by using nucleotide sequences that include the TSS. Within the promoter, most human genes lack a TATA box and have an initiator element (Inr) or downstream promoter element instead.

On the basis of descriptions available, various Inrs are located to test whether the known TSS is located.

Notations
Notation: let the symbol Inr denote an initiator element.

Notation: let the symbol +1 designate the nucleotide that is the transcription start site (TSS).

Genetics
Inr in humans was first explained and sequenced in 1989.

The Inr element for core promoters was found to be more prevalent than the TATA box in eukaryotic promoter domains. In a study of 1800+ distinct human promoter sequences it was found that 49% contain the Inr element while 21.8% contain the TATA box.

Gene transcriptions
Two subunits, TAF1 and TAF2, of the TFIID recognize the Inr sequence and bring the complex together.

The interaction between TFIID and Inr is believed to be most imperative in initiating transcription due to the Inr sequence overlapping the start site.

The Inr element is also believed to interact with the activator Sp1 transcription factor (Sp1), specificity protein 1 transcription factor, which is then able to regulate the activation and initiation of transcription

Promoters with a functional Inr are more likely to lack a TATA box or to possess a degenerate TATA sequence because a gene with an active Inr is less dependent on a functional TATA box or additional promoters. Although Inr element varies between promoters, the sequence is highly conserved between humans and yeast. An analysis of 7670 transcription start sites showed that roughly 40% had an exact match to the BBCA+1BW Inr sequence, while 16% contained only one mismatch TFIID and subunits are very sensitive to the Inr sequence and nucleotide changes have been shown to drastically change the binding affinity, where the +1 and -3 positions have been identified as the most critical for transcription efficiency and Inr function. A replacement of the Adenosine (A) nucleotide at the +1 to G or T changes transcription activity by 10% and a replacement of Thymine (T) at the +3 position changes transcription activity levels by 22%.

Theoretical initiator elements
Here's a theoretical definition:

Def. a series of nucleotides including a transcription start site on one DNA strand whose presence in a gene promoter eventually leads to a chain reaction or polymerization such as transcription is called an initiator element.

RNA polymerase IIs
"RNA pol II itself recognizes features of the Inr which might assist the correct positioning of the polymerase on the promoter (Carcamo et al., 1991; Weis and Reinberg, 1997)."

RNA polymerase II may form a stable complex on TATA-less promoters that contain Inr elements and possess a weak, intrinsic preference for Inr-like sequences.

RNA polymerase II holoenzyme complexes
Gene ID: 672 is BRCA1 BRCA1, DNA repair associated. "This gene encodes a nuclear phosphoprotein that plays a role in maintaining genomic stability, and it also acts as a tumor suppressor. The encoded protein combines with other tumor suppressors, DNA damage sensors, and signal transducers to form a large multi-subunit protein complex known as the BRCA1-associated genome surveillance complex (BASC). This gene product associates with RNA polymerase II, and through the C-terminal domain, also interacts with histone deacetylase complexes. This protein thus plays a role in transcription, DNA repair of double-stranded breaks, and recombination. Mutations in this gene are responsible for approximately 40% of inherited breast cancers and more than 80% of inherited breast and ovarian cancers. Alternative splicing plays a role in modulating the subcellular localization and physiological function of this gene. Many alternatively spliced transcript variants, some of which are disease-associated mutations, have been described for this gene, but the full-length natures of only some of these variants has been described. A related pseudogene, which is also located on chromosome 17, has been identified."

Gene ID: 1660 is DHX9 DExH-box helicase 9 (aka LKP; RHA; DDX9; NDH2; NDHII). "This gene encodes a member of the DEAH-containing family of RNA helicases. The encoded protein is an enzyme that catalyzes the ATP-dependent unwinding of double-stranded RNA and DNA-RNA complexes. This protein localizes to both the nucleus and the cytoplasm and functions as a transcriptional regulator. This protein may also be involved in the expression and nuclear export of retroviral RNAs. Alternate splicing results in multiple transcript variants. Pseudogenes of this gene are found on chromosomes 11 and 13."

BRCA1 has been shown to interact with DHX9; i.e., overexpression of a protein fragment of RNA helicase A causes inhibition of endogenous BRCA1 function and defects in ploidy and cytokinesis in mammary epithelial cells and the BRCA1 protein is linked to the RNA polymerase II holoenzyme complex via RNA helicase A.

ATP-dependent RNA helicase A (RHA; also known as DHX9, LKP, and NDHI) is an enzyme that in humans is encoded by the DHX9 gene.

RNA polymerase II subunit A C-terminal domain phosphatase is an enzyme that in humans is encoded by the CTDP1 gene.

Gene ID: 9150 is CTDP1 CTD phosphatase subunit 1. "This gene encodes a protein which interacts with the carboxy-terminus of the RAP74 subunit of transcription initiation factor TFIIF, and functions as a phosphatase that processively dephosphorylates the C-terminus of POLR2A (a subunit of RNA polymerase II), making it available for initiation of gene expression. Mutations in this gene are associated with congenital cataracts, facial dysmorphism and neuropathy syndrome (CCFDN). Alternatively spliced transcript variants encoding different isoforms have been described for this gene."

"This gene encodes a protein which interacts with the carboxy-terminus of transcription initiation factor TFIIF, a transcription factor which regulates elongation as well as initiation by RNA polymerase II. The protein may also represent a component of an RNA polymerase II holoenzyme complex. Alternative splicing of this gene results in two transcript variants encoding 2 different isoforms."

CTDP1 has been shown to interact with WD repeat-containing protein 77, GTF2F1 and POLR2A.

Gene ID: 168400 is DDX53 DEAD-box helicase 53. "This intronless gene encodes a protein which contains several domains found in members of the DEAD-box helicase protein family. Other members of this protein family participate in ATP-dependent RNA unwinding."

"DEAD/DEAH box helicases are proteins, and are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein with RNA helicase activity. It may participate in melting of DNA:RNA hybrids, such as those that occur during transcription, and may play a role in X-linked gene expression. It contains 2 copies of a double-stranded RNA-binding domain, a DEXH core domain and an RGG box. The RNA-binding domains and RGG box influence and regulate RNA helicase activity."

Consensus sequences
As in other metazoans, for genes lacking a TATA box, the Inr is functionally analogous, with a base pair (bp) consensus 5'-YYA+1NWYY-3', to direct transcription initiation. Using the degenerate nucleotide code, the consensus sequence is 5'-C/T-C/T-A-A/C/G/T-A/T-C/T-C/T-3', or in the direction of transcription on the template strand: 3'-C/T-C/T-A-A/C/G/T-A/T-C/T-C/T-5'.

"TATA-less core promoters that lack AT-rich sequences in the -30 region and do not stably bind TBP are likely to assemble PICs via alternative pathways and to be regulated by distinct mechanisms (Smale and Kadonaga, 2003). However, the number of such bona fide TATA-less genes remains unclear in eukaryotic genomes."

In Entamoeba histolytica, the consensus sequence is AAAAATTCA.

The Inr has the consensus sequence YYANWYY. Similarly to the TATA box, the Inr element facilitates the binding of transcription Factor II D (TATA binding protein TAF).

Enhancers
An Inr for mammalian RNA polymerase II can be defined as a DNA sequence element that overlaps a TSS and is sufficient for


 * 1) determining the start site location in a promoter that lacks a TATA box and
 * 2) enhancing the strength of a promoter that contains a TATA box.

TATA binding protein associated factors
"Although any isolated TAF may not exhibit sequence-specific interactions at the Inr element in the absence of a TATA-box, a combination of TAFs may bind sequence specifically to the Inr element regardless of the TATA-box and/or DPE (Chalkley and Verrijzer, 1999)." Bold added.

TAF1 "binds to core promoter sequences encompassing the transcription start site. It also binds to activators and other transcriptional regulators, and these interactions affect the rate of transcription initiation."

Prior to transcription, stable binding to an Inr occurs by a complex consisting of TAF1 and TAF2.

TATA box-likes
The Inr is the only element in metazoan protein-encoding genes known to be a functional analog of the TATA box, in that it is sufficient for directing accurate transcription initiation in genes that lack TATA boxes.

General transcription factor II As
General transcription factor II A is critical for the cooperative binding of TFIID to the Inr.

General transcription factor II Ds
The general transcription factor II D (TFIID) is one of several general transcription factors that make up the RNA polymerase II preinitiation complex. Before the start of transcription, the transcription factor II D (TFIID) complex, binds to the core promoter of the gene.

TFIID is the first protein to bind to DNA during the formation of the pre-initiation transcription complex of RNA polymerase II (RNA Pol II).

General transcription factor II Is
General transcription factor II I, or TFII-I, is a factor capable of binding the Inr element.

Transcription start sites
Usually the Inr contains the TSS.

"[T]he initiator (INR) element located at, or immediately adjacent to, the TSS, ... is recognized by the TBP-associated factors TAF1 and TAF2 of the TFIID complex".

"[T]ranscription does not need to begin at the +1 nucleotide for the Inr to function. RNA polymerase II has been redirected to alternative start sites by reducing ATP concentrations within a nuclear extract, by altering the spacing between the TATA and Inr in a promoter containing both elements, and by dinucleotide initiation strategies".

Hypotheses

 * 1) A1BG is not transcribed by an initiator element.
 * 2) A1BG is not transcribed by a TATA box.