Draft:Original research/Deoxyribonucleic acids

Deoxyribonucleic acid (DNA) is a polymer composed of nucleic acids linked together by a sugar-phosphate backbone.

The nucleic acids are inorganic acids with phosphoric acid as the only acid.

Attached to each sugar is a nucleobase, also called a nucleotide or "base". Although the nucleotide is attached to the sugar via a covalent bond, the bases are connected by a hydrogen bond - making the molecule of DNA easy to separate with heat but also allows it to return to a double-stranded molecule spontaneously when cooled back down.

Theoretical deoxyribonucleic acids
Def. a "nucleic acid found in all living things (and some non-living, see virus); consists of a polymer formed from nucleotides which are shaped into a double helix; it is associated with the transmission of genetic information" is called a deoxyribonucleic acid.

Polyphosphoric acids
Def. "a colourless liquid, H3PO4" is called phosphoric acid, orthophosphoric acid, or monophosphoric acid.

An orthophosphoric acid molecule can dissociate up to three times, giving up an H+ each time, which typically combines with a water molecule, H2O, as shown in these[chemical reactions:


 * H3PO4(s)  + H2O(l) ↔ H3O+(aq) + H2PO4−(aq)       Ka1= 7.25×10−3


 * H2PO4−(aq)+ H2O(l) ↔ H3O+(aq) + HPO42−(aq)      Ka2= 6.31×10−8


 * HPO42−(aq)+ H2O(l) ↔ H3O+(aq) + PO43−(aq)        Ka3= 4.80×10−13

The anion after the first dissociation, H2PO4−, is the dihydrogen phosphate anion. The anion after the second dissociation, HPO42−, is the hydrogen phosphate anion. The anion after the third dissociation, PO43−, is the phosphate or orthophosphate anion. For each of the dissociation reactions shown above, there is a separate acid dissociation constant, called Ka1, Ka2, and Ka3 given at 25 °C. Associated with these three dissociation constants are corresponding pKa1=2.12, pKa2=7.21, and pKa3=12.67 values at 25 °C. Even though all three hydrogen (H) atoms are equivalent on an orthophosphoric acid molecule, the successive Ka values differ since it is energetically less favorable to lose another H+ if one (or more) has already been lost and the molecule/ion is more negatively charged.

For a given total acid concentration [A] = [H3PO4] + [H2PO4−] + [HPO42−] + [PO43−] ([A] is the total number of moles of pure H3PO4 which have been used to prepare 1 liter of solution), the composition of an aqueous solution of phosphoric acid can be calculated using the equilibrium equations associated with the three reactions described above together with the [H+][OH−] = 10−14 relation and the electrical neutrality equation. Possible concentrations of polyphosphoric molecules and ions is neglected. The system may be reduced to a fifth degree equation for [H+] which can be solved numerically, yielding:

For large acid concentrations, the solution is mainly composed of H3PO4. For [A] = 10−2, the pH is close to pKa1, giving an equimolar mixture of H3PO4 and H2PO4−. For [A] below 10−3, the solution is mainly composed of H2PO4− with [HPO42−] becoming non negligible for very dilute solutions. [PO43−] is always negligible. Note that the above analysis does not take into account ion activity coefficients; as such, the pH and molarity of a real phosphoric acid solution may deviate substantially from the above values.

Def. typically two to twenty, or three to seven, linked monophosphoric acids, or orthophosphoric acids, is called an oligophosphoric acid.

Def. "any of a class of inorganic polymers containing linked phosphate groups""any of a class of inorganic polymers containing linked phosphate groups", or more than five linked phosphoric acids, is called a polyphosphate, or polyphosphoric acid.

Nitrogenous bases
Nitrogenous bases, found in cell nuclei, are nucleobases.

In normal spiral DNA the bases form pairs between the two strands: Adenine (A) with Thymine (T) and Cytosine (C) with Guanine (G). Purines pair with pyrimidines mainly for dimensional reasons - only this combination fits the constant width geometry of the DNA spiral.

Nitrogenous bases include purines: adenine, guanine, hypoxanthine, isoguanine, xanthine, and 7-methylguanine; and pyrimidines: cytosine, thymine, and isocytosine.

Adenine
Adenine has IUPAC Name of 9H-purin-6-amine. pKa=4.15 (secondary), 9.80 (primary)

Adenine derivatives have a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate (ATP) and the cofactors nicotinamide adenine dinucleotide (NAD) and flavin adenine dinucleotide (FAD), functions in protein synthesis and as a chemical component of DNA and RNA.

The adjacent image on the right shows pure adenine, as an independent molecule.

Adenosine is adenine reacted with ribose.

Adenine attached to deoxyribose is deoxyadenosine.

Adenine forms several tautomers, compounds that can be rapidly interconverted and are often considered equivalent; however, in isolated conditions, i.e. in an inert gas matrix and in the gas phase, mainly the 9H-adenine tautomer is found.

Adenine synthesis
Both adenine and guanine are derived from the nucleotide inosine monophosphate (IMP), which in turn is synthesized from a pre-existing ribose phosphate through a pathway using atoms from the amino acids glycine, glutamine, and aspartic acid, and the coenzyme tetrahydrofolate.

Sugars
Def. "a derivative of the pentose sugar ribose in which the 2' hydroxyl (-OH) is reduced to a hydrogen (H)" is called deoxyribose.

In the diagram of DNA at the page top, the pentose sugar deoxyribose is in a cyclic, furanose (5-membered ring) form. D-deoxyribose or L-deoxyribose is not demonstrated, but is determined by the hydroxyls being primarily below (D) or above (L) the plane of the ring. In Earth-based DNA, deoxyribose is in the dextro (D) configuration.

Deoxyribose may also occur in a pyranose (six-membered ring) form.

There are other pentose sugars including aldopentoses: apiose, arabinose, xylose, and lyxose, and ketopentoses: ribulose and xylulose. These may have a deoxy-form: deoxyapiose, deoxyarabinose, deoxyxylose, deoxylyxose, deoxyribulose, and deoxyxylulose. They may occur as a levo or dextro sugar and as a furanose or pyranose.

To occur in an Earth-like DNA, each of these six deoxypentoses and perhaps other sugars need to be dextro furanoses. Each is a DNA, for example, deoxyapionucleic acid.

Sugar phosphates
Any of the various sugars can have one or more phosphates attached as in glucose-6-phosphate diagrammed on the right.

Nucleosides
Def. "an organic molecule in which a nitrogenous heterocyclic base (or nucleobase), which can be either a double-ringed purine or a single-ringed pyrimidine, is covalently attached to a five-carbon pentose sugar (deoxyribose in DNA or ribose in RNA)" is called a nucleoside.

Nucleotides
"All five nucleotides (including the RNA base "uracil") are synthesized through complex metabolic pathways involving several multi-subunit enzymes. The pathways differ for both purines and pyrimidines (uracil falling in the pyrimidine category since it is thymine's counterpart.)"

Nucleic acids
"Synthetic genetics is a subdiscipline of synthetic biology that aims to develop artificial genetic polymers (also referred to as xeno-nucleic acids or XNAs) that can replicate in vitro and eventually in model cellular organisms."

Def. any "acidic, chainlike biological macromolecule consisting of multiply repeat units of phosphoric acid, sugar and purine and pyrimidine bases" occurring in cell nuclei is called a nucleic acid.

Def. a nucleic acid "in which the sugar component is threose" is called threose nucleic acid, or threonucleic acid (TNA).

Def. a "synthetic organic polymer similar to DNA or RNA; the sugar backbone of nucleic acids is replaced by N-(2-aminoethyl)-glycine linked through peptide bonds - considered a possible precursor to RNA" is called peptide nucleic acid.

Additional DNAs may be
 * 1) deoxyapionucleic acid,
 * 2) deoxyarabinonucleic acid,
 * 3) deoxyxylonucleic acid (dXyNA),
 * 4) deoxylyxonucleic acid,
 * 5) deoxyribulonucleic acid, and
 * 6) deoxyxylulonucleic acid.

Synthesis of deoxyapionucleic acid has been accomplished.

Deoxyxylonucleic acid and xylose nucleic acid have been produced.

"[X]ylonucleic acid (XyloNA) [contains] a potentially prebiotic xylose sugar (a 3′-epimer of ribose) in its backbone."

A "number of sugar-modified nucleic acid variants has been revealed as new genetic polymers, (2) some of them are endowed with catalytic activity (for e.g. FANA and HNA) (3). The structure of these artificial nucleic acids, however, mimics natural nucleic acid helicity (4)."

"Although helices display a distinct pitch and curvature, they feature ca. 11–12 base pairs per turn, and χ/δ covariance plots indicate that the backbones of XNA:RNA or XNA:DNA heteroduplexes adopt an architecture that is either closely related to the A-form, as in the case of [1,5-anhydrohexitol nucleic acid (HNA)] HNA:RNA (96), [locked nucleic acid (LNA)] LNA:RNA (83), [cyclohexene nucleic acid (CeNA)] CeNA:RNA (85) and PNA:RNA (59), or between the A- and B-forms, as seen in the structures of DNA:RNA (97), [arabinonucleic acid (ANA)] ANA:RNA (79), [2′-deoxy-2′-fluoro-arabinonucleic acid (FANA)]FANA:RNA (79) and [peptide nucleic acid (PNA)] PNA:DNA (98)."

Additional XNAs include bridged nucleic acid (BNA) glycol nucleic acid (GNA), FANA and peptide nucleic acid (PNA).

On the right is a diagram displaying various artificial and natural nucleic acid polymers.

"Representative structures illustrate the structural diversity and plasticity of natural and artificial nucleic acid (XNA) backbones. Structures are shown in alphabetic order. (A) Natural genetic polymers: B-form DNA (black), DNA:RNA hybrid and A-form RNA (gray). (B) Representative structures of XNA heteroduplexes with RNA or DNA. The RNA strand is shown in gray, the DNA strand in black and the orientation of the XNA strand is indicated. (C) XNA homoduplexes. Homo-XNA duplexes adopt a variety of structures. (D) Representative XNA-only heteroduplexes. FAF:FAF stands for FANA(F)-ANA(A)-FANA(F) XNA:XNA heteroduplex. Alt and chim indicate the alternated or chimeric order of FANA-segments in the duplex sequences respectively. The depicted duplexes have the following PDB ID codes in the Protein Data Bank (http://www.rcsb.org): B-DNA (3BSE); DNA:RNA (1EFS); A-RNA (3ND4); ANA(purple):RNA (2KP3); CeNA(blue):RNA (3KNC); FANA(violet):RNA (2KP4); HNA(yellow):RNA (2BJ6); LNA(cyan):RNA (1H0Q); PNA(orange):DNA (1PDT); PNA(orange):RNA (176D); CeNA:CeNA (blue, 2H0N); hDNA:hDNA (sky blue, 2H9S); FRNA:FRNA (magenta, 3P4A); GNA:GNA (red, 2XC6); HNA:HNA (yellow, 481D); LNA:LNA (cyan, 2×2Q); PNA:PNA (orange, 2K4G), TNA:TNA (green, coordinates not deposited in the PDB [...]); dXyNA:dXyNA (brown, coordinates not deposited in the PDB [...]); XyNA:XyNA (light green, 2N4J); FAF:FAF (FANA in violet, ANA in purple, 2LSC), FRNA:FANA (alt) (FRNA in magenta, FANA in violet, 2M8A); FRNA:FANA (chim) (FRNA in magenta, FANA in violet, 2M84)."

Mitochondrial DNA
Mitochondrial DNA (mtDNA or mDNA) consists of one or more strands making up the small circular chromosomes inside mitochondria. Mitochondria are passed only from mother to offspring.

The 16,569 base pairs of human mitochondrial DNA encode for 37 genes.

The guanine-rich strand is the heavy strand (or H-strand) which encodes 28 genes and the cytosine-rich strand is the light strand (or L-strand) which encodes the other 9 genes.

The H (heavy, outer circle) and L (light, inner circle) strands are given with their corresponding genes.

There are 22 transfer RNA (TRN) genes for the following amino acids: F, V, L1 (codon UUA/G), I, Q, M, W, A, N, C, Y, S1 (UCN), D, K, G, R, H, S2 (AGC/U), L2 (CUN), E, T and P (white boxes).

There are 2 ribosomal RNA (RRN) genes: S (small subunit, or 12S) and L (large subunit, or 16S) (blue boxes).

There are 13 protein-coding genes: 7 for NADH dehydrogenase subunits (ND, yellow boxes), 3 for cytochrome c oxidase subunits (COX, orange boxes), 2 for ATPase subunits (ATP, red boxes), and one for cytochrome b (CYTB, coral box).

Two gene overlaps are indicated (ATP8-ATP6, and ND4L-ND4, black boxes).

The control region (CR) is the longest non-coding sequence (grey box). Its three hyper-variable regions are indicated (HV, green boxes).

Noncoding DNA
"Non-coding DNA sequences do not code for amino acids. Most non-coding DNA lies between genes [intergenic] on the chromosome [...]. Other non-coding DNA, called introns, is found within genes. [...] Non-coding DNA [...] represents 98 percent of our genome sequence and it does all sorts of things, like regulate those genes to figure out where they should turn on, where they should turn off, how much we should turn on certain genes, how are we going to pack up the DNA into chromosomes, and so forth."

Over 80% of human DNA "serves some purpose, biochemically speaking".

Non-coding repetitive sequences
Over 50% of human DNA consists of non-coding repetitive sequences.

Non-coding RNA sequences
Some DNA sequences that do not code protein may still encode functional non-coding RNA molecules, which are involved in the regulation of gene expression.

Pseudogenes
An abundant form of noncoding DNA in humans are pseudogenes, which are copies of genes that have been disabled by mutation. These sequences are usually just molecular fossils, although they can occasionally serve as raw genetic material for the creation of new genes through the process of gene duplication and divergence.

Genes
Def. "[a] unit of heredity; a segment of DNA or RNA that is transmitted from one generation to the next, and that carries genetic information such as the sequence of amino acids for a protein" is called a gene.

The genetic information in a genome is held within genes, and the complete set of this information in an organism is called its genotype. A gene is a unit of heredity and is a region of DNA that influences a particular characteristic in an organism. Genes contain an open reading frame that can be transcribed, as well as regulatory sequences such as promoters and enhancers, which control the transcription of the open reading frame.

Only about 1.5% of the human genome consists of protein-coding exons.

Telomeres
Some non-coding DNA sequences such as telomeres and centromeres play structural roles in chromosomes.

Telomeres are usually lengths of single-stranded DNA containing several thousand repeats of a simple TTAGGG sequence.

Telomeres and centromeres typically contain few genes, but are important for the function and stability of chromosomes.

Centromeres
Centromeres are chromosomal loci that ensure delivery of a copy of a chromosome to each daughter upon cell division. On the Spindle Apparatus, chromosome movement is run and maintained by the centromere during meiosis and mitosis.

Introns
An intron is any nucleotide sequence within a gene that is removed by RNA splicing while the final mature RNA product of a gene is being generated. The term intron refers to both the DNA sequence within a gene and the corresponding sequence in RNA transcripts."

There are several families of internal nucleic acid sequences that are not present in the final gene product, including inteins, untranslated sequences ([Untranslated region] UTR), and nucleotides removed by RNA editing, in addition to introns.

Introns are extremely common within the nuclear genome of higher vertebrates (e.g. humans and mice), where protein-coding genes almost always contain multiple introns.

Some introns themselves encode specific proteins or can be further processed after splicing to generate noncoding RNA molecules. Alternative splicing is widely used to generate multiple proteins from a single gene. Furthermore, some introns represent mobile genetic elements and may be regarded as examples of selfish DNA.

The human genome contains an average of 8.4 introns/gene (139,418 in the genome).

Some introns are known to enhance the expression of the gene that they are contained in by a process known as intron-mediated enhancement (IME).

Human DNA
"[H]uman DNA has millions of on-off switches and complex networks that control the genes' activities. ... [A]t least 80% of the human genome is active, which opposed the previously held idea that most of the DNA are useless."

"DNA contains genes, which hold the instructions for [life. But, these] take up only about 2 percent of the genome ... The human genome is made up of about 3 billion “letters” along strands that make up the familiar double helix structure of DNA. Particular sequences of these letters form genes, which tell cells how to make proteins. People have about 20,000 genes, but the vast majority of DNA lies outside of genes. ... [A]t least three-quarters of the genome is involved in making RNA [...] it appears to help regulate gene activity."

A DNAs
B-DNA is driven into the A form when under dehydrating conditions commonly used to form crystals, and many DNA crystal structures are in the A form.

A-DNA is broader and apparently more compressed along its axis than B-DNA.

B DNAs
The B form described by James Watson and Francis Crick is believed to predominate in cells.

Other conformations are possible; A-DNA, B-DNA, C-DNA, E-DNA, L -DNA (the enantiomeric form of D -DNA), P-DNA, S-DNA, and Z-DNA have been described so far. In fact, only the letters F, Q, U, V, and Y are as of 17 February 2011 available to describe any new DNA structure that may appear in the future. There are also triple-stranded DNA forms and quadruplex forms such as the G-quadruplex and the i-motif.

Z DNAs
The repeating polymer of inosine–cytosine produces a left-handed DNA. The "reverse" circular dichroism spectrum for this DNA was interpreted to mean that the strands wrapped around one another in a left-handed fashion.

The ultraviolet circular dichroism of poly(dG-dC) was nearly inverted in 4 M sodium chloride solution. This was the result of a conversion from B-DNA to Z-DNA, confirmed by examining the Raman spectra of these solutions and the Z-DNA crystals. A crystal structure of "Z-DNA" was the first single-crystal X-ray structure of a DNA fragment (a self-complementary DNA hexamer d(CG)3), resolved as a left-handed double helix with two antiparallel chains held together by Watson–Crick base pairs.

Hypotheses

 * 1) As both ribose and deoxyribose nucleic acids exist, each pentose or hexose sugar should be usable to make a nucleic acid.