Eukaryotic gene example

This page provides a tutorial introduction to common features of eukaryotic genes. This tutorial is about genes that contain instructions that allow living cells to make proteins.

Introduction
A gene can be described by listing the linear sequence of nucleotide subunits that constitutes the "gene's sequence". Eukaryotic genes exist inside cells as DNA molecules. The four nucleotide subunits of DNA are illustrated in the figure shown to the right on this page. For the DNA structure shown in the figure, the sequence of nucleotide subunits can be summarized as ACTG or TGAC. The two strands of the molecule are complementary according to the rules of Base pairing, so it is only necessary to provide the sequence of one strand; the complementary strand can be deduced according to the base pair rules.

Many genomes have been sequenced and their gene sequences are stored in general DNA sequence databases (e.g. GenBank) and in species specific databases (e.g. The Arabidopsis Information Resource (TAIR).

This tutorial and Figures 1-3, below, make us of one specific gene sequence as an example: the sequence of the AMY1 gene, which is one of the approximately 25,000 genes from Arabidopsis thaliana the Thale Cress plant. The AMY1 gene encodes an alpha amylase, an enzyme. Plant cells use the genetic instructions in this gene as a guide for making the amylase protein.

Figures 1-3, as described below, are views of the AMY1 gene sequence, cDNA, and coding sequence (CDS). A cDNA sequence contains part of a gene's entire sequence. The cDNA sequence has the part of the gene sequence that is found in a mature mRNA. The AMY1 gene sequence provides a convenient example of the important features that are found in most eukaryotic genes. The sequences of genes are used by researchers to help them understand living organisms. Gene research for Arabidopsis might involve studies of seed germination or plant food flavour.

Questions
Q1. What is the difference between a gene sequence and a cDNA sequence?

cDNA
The image in Figure 1 (to the right on this page) shows a screenshot of the AMY1 cDNA. This was obtained from TAIR.

Several related views of the AMY1 sequence can be found in gene databases. These include views of the 'full length CDS', 'full length cDNA' (Fig. 1) and 'full length genomic' (Figure 3, below) sequences. These sequences typically use the DNA alphabet (A, T, G, C) although, strictly, the CDS should be shown as RNA (AUG etc.) since it represents an RNA sequence.

A typical eukaryotic gene is transcribed into an pre-mRNA transcript that is then processed into a mature mRNA by removal of introns and 5' and 3' processing. Note that the AMY1 cDNA sequence starts with 20 nucleotides (AAACCATTCA CAATCAGACA) that do not code for amino acids in the amylase enzyme. The 5' untranslated sequence and the intron/exon structure of the AMY1 gene transcript is shown in Figure 2. Only the exon sequences specify the amino acid structure of the amylase enzyme.  

Question
Q1. Define "intron" and "exon".

Gene and mRNA
The mature mRNA is composed of a 5' UTR (red) CDS (uppercase yellow) and 3' UTR (red again) (Figure 3). All three of these regions are exonic (not just the protein coding sequence (CDS)). Introns are shown in purple (lowercase) and are not present in the mature mRNA.

For convenience neither the 5' Cap nor 3' tail are shown in the cDNA (Figure 1) although the mRNA will have them. The gene sequence is also shown in a form where the codons can be read (ATG...), rather than as the template DNA strand which is actually copied into mRNA.

This gene structure view is typical of a eukaryotic gene. Similar views of genes can be obtained for many species including human (including the extinct Neanderthals), Chimpanzee, fruitfly, or yeast.

Your turn
Your mission, should you decide to accept it, is to save the lives of thousands of hemophiliacs.

Haemophilia A is treated with the blood clotting protein Factor VIII. Factor VIII is isolated from donated blood and the blood supply is contaminated with a mysterious virus that is killing hemophiliacs.

It will be possible to manufacture uncontaminated Factor VIII (as is done for insulin) if you can obtain the gene sequence that codes for Factor VIII. Find the human (Homo sapiens) Factor VIII cDNA sequence in this database. Describe what you find below.

Results and questions
Q1. In order to provide full Factor VIII function to hemophiliacs, do you need to obtain the cDNAs for both Transcript variant 1 and Transcript variant 2?