Gene transcriptions/Boxes/AGCs/Laboratory

A laboratory is a specialized activity where a student, teacher, or researcher can have hands-on, or as close to hands-on as possible, experience actively analyzing an entity, source, or object of interest.

Usually, expensive equipment, instruments, and/or machinery are available for taking the entity apart to see and accurately record how it works, what it's made of, and where it came from. This may involve simple experiments to test reality, collect data, and try to make some sense out of it.

Expensive equipment can be replaced or substituted for with more readily available tools.

Notations
You are free to create your own notation or use that already presented. A method to statistically assess your locator is also needed.

Laboratory control group
A laboratory control group of some large number of laboratory test subjects or results may be used to define normal limits for the presence of an effect.

Instructions
This laboratory is an activity for you to explore the universe for, to create a method for, or to examine. While it is part of the, it is also independent.

Some suggested entities to consider are
 * 1) available classification,
 * 2) human genes,
 * 3) eukaryotes,
 * 4) nucleotides,
 * 5) classical physics quantities, or
 * 6) geometry.

More importantly, there are your entities.

You may choose to define your entities or use those already available.

Usually, research follows someone else's ideas of how to do something. But, in this laboratory you can create these too.

This is a gene project laboratory, but you may create what a laboratory, or a  is.

Yes, this laboratory is structured.

I will provide an example. The rest is up to you.

Questions, if any, are best placed on the Discuss page.

To include your participation in each of these laboratories create a subpage of your user page once you register at wikiversity and use this subpage, for example, your online name/laboratory effort.

Enjoy learning by doing!

Hypotheses

 * 1) An AGC box occurs in the human genome.
 * 2) A1BG is transcribed by an AGC box.

Introduction
"The GCC box, also referred to as the AGC box (10), GCC element (11), or AGCCGCC sequence (13), is an ethylene-responsive element found in the promoters of a large number of [pathogenesis related] PR genes whose expression is up-regulated following pathogen attack."

The AGC box has a consensus sequence as 3'-AGCCGCC-5' in the direction of transcription.

"AGC is a binding site for factors responding to pathogen attacks (Ohme-Takagi et al., 2000)".

For "AGC, one copy in inverse orientation of the AGC box (AGCCGCC) [is] present as two copies (-1346 and -1314) in the ERE".

"Enhancer activity, ethylene responsiveness, and binding of nuclear proteins depend on the integrity of two copies of the AGC box, AGCCGCC, present in the promoters of several ethylene-responsive genes."

"The GLB enhancer contains two copies of the sequence AGCCGCC, which is conserved in several genes showing expression patterns similar to the GLB gene, as well as a sequence identical at 6 of 7 bp."

"One common motif, AGCCGCC (AGC box), has been found to be present in nearly all chitinase and glucanase promoters so far analyzed (Ohme-Takagi and Shinshi 1990; Hart et al. 1993)."

These citations concern the AGC box in plant genes.

Samplings
Once you've decided on an entity, source, or object, compose a method, way, or procedure to explore it.

One way is to perceive (see, feel, hear, taste, or touch, for example) if there are more than one of them.

Ask some questions about it.

Does it appear to have a spatial extent?

Is there any change over time?

Can it be profiled with a kind of spectrum for example, by emitted radiation? Sample by plotting two or more apparent variables against each other, like intensity versus wavelength.

Is there some location, time, intensity, where there isn't one?

Regarding hypotheses 1:
 * 1) Gene ID: 1874 - "The protein encoded by this gene is a member of the E2F family of transcription factors. The E2F family plays a crucial role in the control of cell cycle and action of tumor suppressor proteins and is also a target of the transforming proteins of small DNA tumor viruses. The E2F proteins contain several evolutionally conserved domains found in most members of the family. These domains include a DNA binding domain, a dimerization domain which determines interaction with the differentiation regulated transcription factor proteins (DP), a transactivation domain enriched in acidic amino acids, and a tumor suppressor protein association domain which is embedded within the transactivation domain. This protein binds to all three of the tumor suppressor proteins pRB, p107 and p130, but with higher affinity to the last two. It plays an important role in the suppression of proliferation-associated genes, and its gene mutation and increased expression may be associated with human cancer."
 * 2) "The AGC triplet repeat in the coding region of the E2F-4 gene, a member of the family, has been reported to be mutated in colorectal cancers with a microsatellite instability (MSI) phenotype. We found a wider range variation of the repeat number in DNAs from tumors, the corresponding normal mucosa, and healthy individuals. A total of 5 repeat variants, ranging from 8 to 17 AGC repeats, was detected in 6 (9.7%) of the 62 healthy individuals and 8 (8.9%) of the 90 normal DNAs of the patients. The wild-type 13 repeat was present in all of these individuals. The variation of the AGC repeat number may be a polymorphism. Further, loss of heterozygosity (LOH) at the E2F-4 locus in the tumor tissues of 2 (25%) of the 8 informative cases was detected."

These indicate a human gene with an AGC box.

Regarding hypothesis 2:
 * 1) Gene transcription of the human gene A1BG is studied at gene transcription of A1BG.
 * 2) Using a Basic language computer program, SuccessablesAGC.bas, each of the transcription directions is sampled by changing the algorithm.

An AGC box has the consensus sequence 3'-AGCCGCC-5' in the direction of transcription. It may also occur as 3'-TCGGCGG-5' in the direction of transcription, or inverted which has been reported: 3'-CCGCCGA-5' and 3'-GGCGGCT-5'. Ideally, each of these four should be tested on each of the four possible transcription directions.

A1BG has four possible transcription directions:
 * 1) on the negative strand from ZSCAN22 to A1BG,
 * 2) on the positive strand from ZSCAN22 to A1BG,
 * 3) on the negative strand from ZNF497 to A1BG, and
 * 4) on the positive strand from ZNF497 to A1BG.

For each transcription promoter that interacts directly with RNA polymerase II holoenzyme, the four possible consensus sequences need to be tested on the four possible transcription directions, even though some genes may only be transcribed from the negative strand in the 3'-direction on the transcribed strand.

For the Basic programs (starting with SuccessablesAGC.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
 * 1) negative strand in the negative direction is SuccessablesAGC--.bas, looking for 3'-AGCCGCC-5', 0,
 * 2) negative strand in the positive direction is SuccessablesAGC-+.bas, looking for 3'-AGCCGCC-5', 0,
 * 3) positive strand in the negative direction is SuccessablesAGC+-.bas, looking for 3'-AGCCGCC-5', 0,
 * 4) positive strand in the positive direction is SuccessablesAGC++.bas, looking for 3'-AGCCGCC-5', 0,
 * 5) complement, negative strand, negative direction is SuccessablesAGCc--.bas, looking for 3'-TCGGCGG-5', 0,
 * 6) complement, negative strand, positive direction is SuccessablesAGCc-+.bas, looking for 3'-TCGGCGG-5', 0,
 * 7) complement, positive strand, negative direction is SuccessablesAGCc+-.bas, looking for 3'-TCGGCGG-5', 0,
 * 8) complement, positive strand, negative direction is SuccessablesAGCc++.bas, looking for 3'-TCGGCGG-5', 0,
 * 9) inverse complement, negative strand, negative direction is SuccessablesAGCci--.bas, looking for 3'-GGCGGCT-5', 0,
 * 10) inverse complement, negative strand, positive direction is SuccessablesAGCci-+.bas, looking for 3'-GGCGGCT-5', 0,
 * 11) inverse complement, positive strand, negative direction is SuccessablesAGCci+-.bas, looking for 3'-GGCGGCT-5', 1, 3'-GGCGGCT-5', 1754,
 * 12) inverse complement, positive strand, positive direction is SuccessablesAGCci++.bas, looking for 3'-GGCGGCT-5', 0,
 * 13) inverse, negative strand, negative direction, is SuccessablesAGCi--.bas, looking for 3'-CCGCCGA-5', 1, 3'-CCGCCGA-5', 1754,
 * 14) inverse, negative strand, positive direction, is SuccessablesAGCi-+.bas, looking for 3'-CCGCCGA-5', 0,
 * 15) inverse, positive strand, negative direction, is SuccessablesAGCi+-.bas, looking for 3'-CCGCCGA-5', 0,
 * 16) inverse, positive strand, positive direction, is SuccessablesAGCi++.bas, looking for 3'-CCGCCGA-5', 0.

Verifications
To verify that your sampling has explored something, you may need a control group. Perhaps where, when, or without your entity, source, or object may serve.

Another verifier is reproducibility. Can you replicate something about your entity in your laboratory more than 3 times. Five times is usually a beginning number to provide statistics (data) about it.

For an apparent one time or perception event, document or record as much information coincident as possible. Was there a butterfly nearby?

Has anyone else perceived the entity and recorded something about it?

Gene ID: 1, includes the nucleotides between neighboring genes and A1BG. These nucleotides can be loaded into files from either gene toward A1BG, and from template and coding strands. These nucleotide sequences can be found in Gene transcriptions/A1BG. Copying the above discovered AGC boxes and putting the sequences in "⌘F" locates these two sequences in the same nucleotide positions as found by the computer programs.

Core promoters
The core promoter is approximately -34 nts upstream from the TSS.

From the first nucleotide just after ZSCAN22 to the first nucleotide just before A1BG are 4460 nucleotides. The core promoter on this side of A1BG extends from approximately 4425 to the possible transcription start site at nucleotide number 4460. The AGC box ends at nucleotide number 1754.

Def. "the factors, including RNA polymerase II itself, that are minimally essential for transcription in vitro from an isolated core promoter" is called the basal machinery, or basal transcription machinery.

There is no AGC box in the basal transcription machinery for A1BG.

Proximal promoters
Def. a "promoter region [juxtaposed to the core promoter that] binds transcription factors that modify the affinity of the core promoter for RNA polymerase.[12][13]" is called a proximal promoter.

The proximal sequence upstream of the gene that tends to contain primary regulatory elements is a proximal promoter.

It is approximately 250 base pairs or nucleotides, nts upstream of the transcription start site.

The proximal promoter begins about nucleotide number 4210. As such there is no AGC box within the proximal promoter of A1BG.

Distal promoters
The "upstream regions of the human CYP11A and bovine CYP11B genes [have] a distal promoter in each gene. The distal promoters are located at −1.8 to −1.5 kb in the upstream region of the CYP11A gene and −1.5 to −1.1 kb in the upstream region of the CYP11B gene."

"Using cloned chicken βA-globin genes, either individually or within the natural chromosomal locus, enhancer-dependent transcription is achieved in vitro at a distance of 2 kb with developmentally staged erythroid extracts. This occurs by promoter derepression and is critically dependent upon DNA topology. In the presence of the enhancer, genes must exist in a supercoiled conformation to be actively transcribed, whereas relaxed or linear templates are inactive. Distal protein–protein interactions in vitro may be favored on supercoiled DNA because of topological constraints."

Distal promoter regions may be a relatively small number of nucleotides, fairly close to the TSS such as (-253 to -54) or several regions of different lengths, many nucleotides away, such as (-2732 to -2600) and (-2830 to -2800).

The "[d]istal promoter is not a spacer element."

Using an estimate of 2 knts, a distal promoter to A1BG would be expected after nucleotide number 2460. The AGC box discovered may be in the distal promoter of ZSCAN22.

But, an estimate of -2830 nts is 1630 nts from ZSCAN22, so the AGC box could be a weak enhancer or inhibitor subject to DNA folding for either gene. On the negative strand, in the positive direction towards ZSCAN22, the AGC box (3'-CCGCCGA-5' for A1BG) is actually a normal one (3'-AGCCGCC-5') rather than the inverse.

An extension of the nucleotide data for the positive direction from ZNF475 toward A1BG from 958 nts to 4445 nts has not discovered any AGC boxes even in the distal promoter just beyond ZNF497.

Laboratory reports
Below is an outline for sections of a report, paper, manuscript, log book entry, or lab book entry. You may create your own, of course.

Title
The AGC box and gene transcription initiation of alpha-1-B glycoprotein

by line

by --Marshallsumter (discuss • contribs) 02:49, 18 August 2017 (UTC)

Abstract
Two hypotheses have been examined: (1) an AGC box occurs in the human genome and (2) alpha-1-B glycoprotein (A1BG) is transcribed by an AGC box. These have been tested by literature searching articles that report an AGC box in the promoter region of a particular human gene and by using a simple computer program to look for AGC boxes in the nucleotide sequences on either side of the A1BG gene. Both the template DNA strand and the coding strand have been checked.

Introduction
According to one source, A1BG is transcribed from the direction of ZNF497: 3' - 58864890: CGAGCCACCCCACCGCCCTCCCTTGG+1GGCCTCATTGCTGCAGACGCTCACCCCAGACACTCACTGCACCGGAGTGAGCGCGACCATCATG : 58866601-5', where the second 'G' at left of four Gs in a row is the TSS. Transcription was triggered in cell cultures and the transcription start site was found using reverse transcriptase. But, the mechanism for transcription is unknown.

Controlling the transcription of A1BG may have significant immune function against snake envenomation. A1BG forms a complex that is similar to those formed between toxins from snake venom and A1BG-like plasma proteins. These inhibit the toxic effect of snake venom metalloproteinases or myotoxins and protect the animal from envenomation.

Many transcription factors (TFs) occur upstream and occasionally downstream of the transcription start site (TSS), in a gene's promoter. It isn't known which, if any, assist in locating and affixing the transcription mechanism for A1BG. This examination is the first to test one such DNA-occurring TF: the AGC box.

Experiment
Each hypothesis required at least one experiment.

To test whether an AGC box occurs in any human genes, a literature search was performed. Human Gene ID: 1874 was found to have an AGC box. Further, a computer program search of the nucleotides between ZSCAN22 and A1BG located an AGC box in the positive transcription direction toward ZSCAN22.

Whether A1BG is transcribed or can be transcribed by an AGC box first requires the presence of at least one in its promoter regions. Computer programs were used to systematically go through both the template and coding strand on both sides of A1BG using the nucleotide sequences stored in the Gene database of the NCBI.

Results
At least two human genes were found to have an AGC box: Gene ID: 1874 E2F4 and either Gene ID: 342945 ZSCAN22 or Gene ID: 1 A1BG.

An AGC box was found in the distal promoter of either gene ZSCAN22 or A1BG on both the template and coding strands. But, as the only known transcription of A1BG occurs between Gene ID: 162968 ZNF497 and Gene ID: 1 A1BG, it is unlikely that this AGC box is naturally used to transcribe A1BG.

Discussion
A quick literature search on Google Scholar with ZNF497 or "zinc finger protein 497" and "AGC box" or "GCC box" produced no results. But, a full web search produced several references including a GeneCard for "zinc finger protein 497" and "GCC box", including "May be involved in transcriptional regulation." No transcriptional regulation was stated for A1BG.

A Google search using ZSCAN22 or "zinc finger and SCAN domain containing 22" with "GCC box" produced no results, but zinc fingers are mentioned in association with GCC boxes in plants.

No experimental efforts to force transcription of A1BG from the ZSCAN22 side were performed.

Conclusion
AGC boxes do occur in the distal promoters of human genes. But, it is unlikely that an AGC box is involved in any way with the transcription of A1BG.

Laboratory evaluations
To assess your example, including your justification, analysis and discussion, I will provide such an assessment of my example for comparison and consideration.

Evaluation

No wet chemistry experiments were performed to confirm that Gene ID: 1 is transcribed from the ZSCAN22 side. Examining transcription on this side is not justified without experimental verification. The NCBI database is generalized, whereas individual human genome testing could demonstrate that A1BG is transcribed from the ZSCAN22 side.