Draft:Original research/Gene transcriptions



DNA is a double helix of interlinked nucleotides surrounded by an epigenome. On the basis of biochemical signals, an enzyme, specifically a ribonucleic acid (RNA) polymerase, is chemically bonded to one of the strands (the template strand) of this double helix. The polymerase, once phosphorylated, begins to catalyze the formation of RNA using the template strand. Although the catalysis may have more than one beginning nucleotide (a start site) and more than one ending nucleotide (a stop site) along the DNA, each nucleotide sequence catalyzed that ultimately produces approximately the same RNA is part of a gene. The catalysis of each RNA representation from the template DNA is a transcription, specifically a gene transcription. The overall process is also referred to as gene transcription.

Heredity
Heredity is the passing on of traits from one generation to the next.

Phenotypes
Def. the "appearance of an organism based on a multifactorial combination of genetic traits and environmental factors, especially used in pedigrees" is called a phenotype.

Genetics
Genetics involves the expression, transmission, and variation of inherited characteristics.

Theoretical gene transcriptions
Def. the "copying of DNA segments into RNA, by RNA polymerase, as the first stage of gene expression" is called gene transcription.

Here's a theoretical definition:

Def. a catalysis process to produce each ribonucleic acid representation of a deoxyribonucleic acid gene, or isoform, is called gene transcription.

Nucleic acids
"Synthetic genetics is a subdiscipline of synthetic biology that aims to develop artificial genetic polymers (also referred to as xeno-nucleic acids or XNAs) that can replicate in vitro and eventually in model cellular organisms."

Def. any "acidic, chainlike biological macromolecule consisting of multiply repeat units of phosphoric acid, sugar and purine and pyrimidine bases" occurring in cell nuclei is called a nucleic acid.

Def. a nucleic acid "in which the sugar component is threose" is called threose nucleic acid, or threonucleic acid (TNA).

Additional DNAs may be
 * 1) deoxyapionucleic acid,
 * 2) deoxyarabinonucleic acid,
 * 3) deoxyxylonucleic acid (dXyNA),
 * 4) deoxylyxonucleic acid,
 * 5) deoxyribulonucleic acid, and
 * 6) deoxyxylulonucleic acid.

Synthesis of deoxyapionucleic acid has been accomplished.

Deoxyxylonucleic acid and xylose nucleic acid have been produced.

"[X]ylonucleic acid (XyloNA) [contains] a potentially prebiotic xylose sugar (a 3′-epimer of ribose) in its backbone."

A "number of sugar-modified nucleic acid variants has been revealed as new genetic polymers, (2) some of them are endowed with catalytic activity (for e.g. FANA and HNA) (3). The structure of these artificial nucleic acids, however, mimics natural nucleic acid helicity (4)."

"Although helices display a distinct pitch and curvature, they feature ca. 11–12 base pairs per turn, and χ/δ covariance plots indicate that the backbones of XNA:RNA or XNA:DNA heteroduplexes adopt an architecture that is either closely related to the A-form, as in the case of [1,5-anhydrohexitol nucleic acid (HNA)] HNA:RNA (96), [locked nucleic acid (LNA)] LNA:RNA (83), [cyclohexene nucleic acid (CeNA)] CeNA:RNA (85) and PNA:RNA (59), or between the A- and B-forms, as seen in the structures of DNA:RNA (97), [arabinonucleic acid (ANA)] ANA:RNA (79), [2′-deoxy-2′-fluoro-arabinonucleic acid (FANA)]FANA:RNA (79) and [peptide nucleic acid (PNA)] PNA:DNA (98)."

Additional XNAs include bridged nucleic acid (BNA) glycol nucleic acid (GNA), FANA and peptide nucleic acid (PNA).

On the right is a diagram displaying various artificial and natural nucleic acid polymers.

"Representative structures illustrate the structural diversity and plasticity of natural and artificial nucleic acid (XNA) backbones. Structures are shown in alphabetic order. (A) Natural genetic polymers: B-form DNA (black), DNA:RNA hybrid and A-form RNA (gray). (B) Representative structures of XNA heteroduplexes with RNA or DNA. The RNA strand is shown in gray, the DNA strand in black and the orientation of the XNA strand is indicated. (C) XNA homoduplexes. Homo-XNA duplexes adopt a variety of structures. (D) Representative XNA-only heteroduplexes. FAF:FAF stands for FANA(F)-ANA(A)-FANA(F) XNA:XNA heteroduplex. Alt and chim indicate the alternated or chimeric order of FANA-segments in the duplex sequences respectively. The depicted duplexes have the following PDB ID codes in the Protein Data Bank (http://www.rcsb.org): B-DNA (3BSE); DNA:RNA (1EFS); A-RNA (3ND4); ANA(purple):RNA (2KP3); CeNA(blue):RNA (3KNC); FANA(violet):RNA (2KP4); HNA(yellow):RNA (2BJ6); LNA(cyan):RNA (1H0Q); PNA(orange):DNA (1PDT); PNA(orange):RNA (176D); CeNA:CeNA (blue, 2H0N); hDNA:hDNA (sky blue, 2H9S); FRNA:FRNA (magenta, 3P4A); GNA:GNA (red, 2XC6); HNA:HNA (yellow, 481D); LNA:LNA (cyan, 2×2Q); PNA:PNA (orange, 2K4G), TNA:TNA (green, coordinates not deposited in the PDB [...]); dXyNA:dXyNA (brown, coordinates not deposited in the PDB [...]); XyNA:XyNA (light green, 2N4J); FAF:FAF (FANA in violet, ANA in purple, 2LSC), FRNA:FANA (alt) (FRNA in magenta, FANA in violet, 2M8A); FRNA:FANA (chim) (FRNA in magenta, FANA in violet, 2M84)."

Deoxyribonucleic acid
Deoxyribonucleic acid (DNA) is a polymer composed of nucleic acids linked together with the sugar deoxyribose.

Strands
DNA in humans consists of two strands. One, or a portion of one, is from each parent. The portion of a strand that is transcribed to produce an RNA that is translatable into a protein is usually referred to as the template strand. That portion of the other strand is then the coding strand because it should contain the nucleotides recorded in, or composing, the transcribed RNA.

Epigenomes
Inside each eukaryote nucelus is genetic material (DNA) surrounded by protective and regulatory proteins. These protective and regulatory proteins and the dynamic changes to them that occur during the course of a eukaryote's existence are the epigenome.

Genes
Def. "[a] unit of heredity; a segment of DNA or RNA that is transmitted from one generation to the next, and that carries genetic information such as the sequence of amino acids for a protein" is called a gene.

Def. any "of several different forms of the same protein, arising from either single nucleotide polymorphisms, differential splicing of mRNA, or post-translational modifications (e.g. sulfation, glycosylation, etc.)" is called an isoform.

Def. a "region of a transcribed gene present in the final functional RNA molecule" is called an exon.

Def. a "portion of a split gene that is included in pre-RNA transcripts but is removed during RNA processing and rapidly degraded" is called an intron.

Gene transcription factors
A transcription factor is a protein that binds to specific DNA sequences to control the flow (or transcription) of genetic information from DNA to messenger RNA (mRNA).

Transcription factors perform this function alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the recruitment of RNA polymerase (the enzyme that performs the transcription of genetic information from DNA to RNA) to specific genes.

Gene transcription boxes
Def. "[A] repeating sequence of nucleotides that forms a transcription or a regulatory signal" is called a box.

Def. one "of two specific regions in a promoter" is called a box.

Gene transcription elements
Def. one "of the simplest or essential parts or principles of which anything consists, or upon which the constitution or fundamental powers of anything are based" is called an element.

A gene transcription element is a DNA nucleotide sequence that is a part or aspect of a promoter, especially one that is essential or characteristic for a specific gene or related genes.

Promoters
Def. a "section of DNA that controls the initiation of RNA transcription as a product of a gene" is called a gene promoter, or a promoter in the field of genetics.

Dispersed promoters
A dispersed promoter is a region of DNA that facilitates the transcription of a particular gene, where this promoter region contains several transcription start sites over 50-100 nucleotides.

Dispersed promoters are more recent and less widespread throughout nature than focused promoters.

Focused promoters
A focused promoter contains either a single transcription start site or a distinct cluster of start sites over several nucleotides. Focused promoters are sometimes referred to as narrow peak (NP) promoters.

Distal promoters
A distal promoter is a distant (in numbers of nucleotides) portion of the promoter for a particular gene.

This distal sequence is upstream of the gene.

It is a region of DNA that may contain additional regulatory elements, often with a weaker influence than the proximal promoter.

Here's a theoretical definition:

Def. an upstream region between -2.0 knts to -1.5 knts for a gene that can exist in a supercoiled conformation with this region to be actively transcribed is called a distal promoter.

Proximal promoters
Def. any proximal nucleotide sequence upstream of the gene that tends to contain primary regulatory elements is called a proximal promoter.

Core promoters
The core promoter is the minimal portion of the promoter required to properly initiate gene transcription. It contains a binding site for RNA polymerase (RNA polymerase I, RNA polymerase II, or RNA polymerase III).

The core promoter is approximately -34 nt upstream from the TSS.

Senses
A single strand of DNA [has a positive sense (+)] if an RNA version of the same sequence is translated or translatable into protein. Its complementary strand is called antisense (or negative (-) sense). Sometimes the phrase coding strand is encountered; however, protein coding and non-coding RNA's can be transcribed similarly from both strands, in some cases being transcribed in both directions from a common promoter region, or being transcribed from within introns, on both strands".

The two complementary strands of double-stranded DNA (dsDNA) are usually differentiated as the "sense" strand and the "antisense" strand. The DNA sense strand looks like the messenger RNA (mRNA) and can be used to read the expected protein code by human eyes (e.g. ATG codon = Methionine amino acid). However, the DNA sense strand itself is not used to make protein by the cell. It is the DNA antisense strand which serves as the source for the protein code, because, with bases complementary to the DNA sense strand, it is used as a template for the mRNA. Since transcription results in an RNA product complementary to the DNA template strand, the mRNA is complementary to the DNA antisense strand. The mRNA is what is used for translation (protein synthesis).

The only real biological information that is important for labeling strands is the location of the 5' phosphate group and the 3' hydroxyl group because these ends determine the direction of transcription and translation. A sequence 5' CGCTAT 3' is equivalent to a sequence written 3' TATCGC 5' as long as the 5' and 3' ends are noted. If the ends are not labeled, convention is to assume that the sequence is written in the 5' to 3' direction. Good rule of thumb for figuring out the "sense" strand: Look for the start codon ATG (AUG in mRNA). In the table example, the sense mRNA has the AUG codon at the end (remember that translation proceeds in the 5' to 3' direction).

Preinitiation complexes
For eukaryotic transcription, the RNA polymerase II holoenzyme de-helicizes the DNA, attaches along the template strand.

Once the preinitiation complex has found its appropriate attachment section along the template strand of DNA, RNA polymerase II is attached and begins transcription.

Preinitiation complex assembly
"The assembly of transcription preinitiation complex follows these steps:"
 * 1) TATA binding protein (TBP), a subunit of TFIID (the largest GTF) binds to the promoter (TATA box), creating a sharp bend in the promoter DNA. Then the TBP-TFIIA interactions recruit TFIIA to the promoter.
 * 2) TBP-TFIIB interactions recruit TFIIB to the promoter. RNA polymerase II and TFIIF assemble to form the Polymerase II complex. TFIIB helps the Pol II complex bind correctly.
 * 3) TFIIE and TFIIH then bind to the complex and form the transcription preinitiation complex. TFIIA/B/E/H leave once RNA elongation begins. TFIID will stay until elongation is finished.
 * 4) Subunits within TFIIH that have ATPase and helicase activity create negative superhelical tension in the DNA. This negative superhelical tension causes approximately one turn of DNA to unwind and form the transcription bubble.
 * 5) The template strand of the transcription bubble engages with the RNA polymerase II active site, then RNA synthesis starts.

DNA melting
Often included in this process is the separation of the DNA double helix from the epigenome.

The TATA-binding protein may serve to bend the double helix by 80°.

Generally, DNA melting involves the separation of the two strands so that transcription can begin on the template strand.

"TFIIH [...] is required for DNA melting".

"TFIIE positions TFIIH in a configuration capable of melting the DNA."

The "RAP30 WH domain [may play] an essential role in positioning the flexible promoter DNA downstream of BREd along the Pol II cleft, thus facilitating subsequent steps in the promoter melting process."

"The INR element is sandwiched precisely between these two protein-DNA contacts, an arrangement that may be relevant in promoter melting at the correct position in the DNA. The slightly open clamp conformation seen upon DNA placement onto the cleft following TFIIF addition is likely due to the interaction of the DNA with the clamp head β sheet".

Both "the TFIIB linker helix and the TFIIF arm domain align with the promoter melting start site".

The "tip of the TFIIF arm domain contains seven positively charged residues, whereas four positively charged residues are present on the side of the TFIIB linker helix that faces the DNA [...]. The juxtaposition of these domains within the melting start site is consistent with their direct role in DNA interactions."

The "clamp domain in the open state moves down to engage the open DNA bubble, adopting the conformation observed in the elongation state37 [...]. Thus, the clamp domain completes an open to closed transition throughout the process of [preinitiation complex assembly] PIC assembly and promoter opening [...]. [An] additional protein density now extends from the bottom of the clamp and connects to the dimerization domain of TFIIF [...]. Rigid body fitting of crystal structures suggests that this density corresponds to the stabilized rudder of Pol II and the arm domain of TFIIF. [These] elements [likely] interact with each other as the clamp closes down over the melted DNA. Interestingly, this proposed interaction would prevent re-annealing of the melted DNA. The TFIIB linker helix is near this position and likely participates in the promoter melting process as well. This [...] is consistent with our hypothesis that the flexible TFIIB linker helix and the TFIIF arm domain act together in promoter opening [...]."

"Once promoter DNA melting is further extended and the Pol II clamp closes down, the TFIIB linker helix and the TFIIF arm domain work together with the Pol II rudder to maintain the upstream edge of the DNA bubble."

RNA polymerase II holoenzyme complexes
RNA polymerase II is recruited to the promoters of protein-coding genes in living cells. Or, transcription factories are present and the euchromatin is brought within the nearest transcription factory and A1BG messenger RNA (mRNA) is transcribed.

For those circumstances in which the holoenzyme is built onto the euchromatin, it is necessary to consider the holoenzyme components and the likely sequence of binding, RNA polymerase II entrance upon the scene and subsequent action.

RNA polymerase II (also called RNAP II and Pol II) ... catalyzes the transcription of DNA to synthesize precursors of mRNA and most snRNA and microRNA. In humans RNAP II consists of seventeen protein molecules (gene products encoded by POLR2A-L, where the proteins synthesized from 2C-, E-, and F-form homodimers).

RNA polymerase II holoenzyme complex may also have to search for one or more transcription start sites.

Transcription start sites
The transcription start site is the location where transcription starts at the 5'-end of a gene sequence.

A start site is a biochemically signaled nucleotide or set of nucleotides for attachment either to the epigenome or the DNA.

Phosphorylation
Def. "the process of transferring a phosphate group [e.g., PO43-] from a donor to an acceptor; often catalysed by enzymes" is called phosphorylation.

Hypotheses

 * 1) Gene transcription can occur for each gene, or isoform, on either strand, template (-) or coding (+).
 * 2) Gene transcription can occur for each gene, or isoform, in either direction (+ or -), e.g., (+) → {ATG} or (-) {ATG} ←.
 * 3) Gene transcription can occur for the complement (c) of each gene, or isoform, e.g., {TAC} (c).
 * 4) Gene transcription can occur for the complement of the inverse of each gene, or isoform, e.g. {CAT} (i).
 * 5) Gene transcription can occur for the inverse (i) of each gene, or isoform, e.g. {GTA}.