Gene transcriptions/Boxes/CArGs/Laboratory

A laboratory is a specialized activity where a student, teacher, or researcher can have hands-on, or as close to hands-on as possible, experience actively analyzing an entity, source, or object of interest.

Usually, expensive equipment, instruments, and/or machinery are available for taking the entity apart to see and accurately record how it works, what it's made of, and where it came from. This may involve simple experiments to test reality, collect data, and try to make some sense out of it.

Expensive equipment can be replaced or substituted for with more readily available tools.

Notations
You are free to create your own notation or use that already presented. A method to statistically assess your locator is also needed.

Laboratory control group
A laboratory control group of some large number of laboratory test subjects or results may be used to define normal limits for the presence of an effect.

Instructions
This laboratory is an activity for you to explore the universe for, to create a method for, or to examine. While it is part of the, it is also independent.

Some suggested entities to consider are
 * 1) available classification,
 * 2) human genes,
 * 3) eukaryotes,
 * 4) nucleotides,
 * 5) classical physics quantities, or
 * 6) geometry.

More importantly, there are your entities.

You may choose to define your entities or use those already available.

Usually, research follows someone else's ideas of how to do something. But, in this laboratory you can create these too.

This is a gene project laboratory, but you may create what a laboratory, or a  is.

Yes, this laboratory is structured.

I will provide an example. The rest is up to you.

Questions, if any, are best placed on the Discuss page.

To include your participation in each of these laboratories create a subpage of your user page once you register at wikiversity and use this subpage, for example, your online name/laboratory effort.

Enjoy learning by doing!

Hypotheses

 * 1) A1BG is not transcribed by a CArG box.
 * 2) The lack of a CArG box on either side of A1BG does not prove that it is not actively used to transcribe A1BG.

Introduction
CArG boxes are present in the promoters of smooth muscle cell genes.

"CArG box [CC(A/T)6GG] DNA [consensus] sequences present within the promoters of SMC genes play a pivotal role in controlling their transcription".

"Serum response factor (SRF) controls [smooth muscle cell] SMC gene transcription via binding to CArG box DNA sequences found within genes that exhibit SMC-restricted expression."

"SMC genes examined in this study display SMC-specific histone modifications at the 5′-CArG boxes."

"The SRF-CArG association is required for transcriptional activation of SMC genes [...] the SMC genes examined in this study display SMC-specific histone modifications at the 5′-CArG boxes. [...] enrichment of H4 and H3 acetylation [...] were relatively low from positions –2,800 to –1,600 in the 5′ region. However, at position –1,600 to –1,200, there was a sharp rise in these modifications, which was increased even further at +400 in the coding region. We observed similar patterns for H3K4dMe and H3 Lys79 di-methylation [...]. SRF, TFIID, and RNA polymerase II displayed enrichments that were consistent with the positions of the CArG boxes, TATA box, and coding region, respectively".

The CArG boxes occur between -400 and -200 nts, between the E boxes and the TCE element.

Core promoters


The core promoter is approximately -34 nts upstream from the TSS.

From the first nucleotide just after ZSCAN22 to the first nucleotide just before A1BG are 4460 nucleotides. The core promoter on this side of A1BG extends from approximately 4425 to the possible transcription start site at nucleotide number 4460.

To extend the analysis from inside and just on the other side of ZNF497 some 3340 nts have been added to the data. This would place the core promoter some 3340 nts further away from the other side of ZNF497. The TSS would be at about 4300 nts with the core promoter starting at 4266.

Def. "the factors, including RNA polymerase II itself, that are minimally essential for transcription in vitro from an isolated core promoter" is called the basal machinery, or basal transcription machinery.

Proximal promoters
Def. a "promoter region [juxtaposed to the core promoter that] binds transcription factors that modify the affinity of the core promoter for RNA polymerase.[12][13]" is called a proximal promoter.

The proximal sequence upstream of the gene that tends to contain primary regulatory elements is a proximal promoter.

It is approximately 250 base pairs or nucleotides, nts upstream of the transcription start site".

The proximal promoter begins about nucleotide number 4210 in the negative direction.

The proximal promoter begins about nucleotide number 4195 in the positive direction.

Distal promoters
The "upstream regions of the human CYP11A and bovine CYP11B genes [have] a distal promoter in each gene. The distal promoters are located at −1.8 to −1.5 kb in the upstream region of the CYP11A gene and −1.5 to −1.1 kb in the upstream region of the CYP11B gene."

"Using cloned chicken βA-globin genes, either individually or within the natural chromosomal locus, enhancer-dependent transcription is achieved in vitro at a distance of 2 kb with developmentally staged erythroid extracts. This occurs by promoter derepression and is critically dependent upon DNA topology. In the presence of the enhancer, genes must exist in a supercoiled conformation to be actively transcribed, whereas relaxed or linear templates are inactive. Distal protein–protein interactions in vitro may be favored on supercoiled DNA because of topological constraints."

Distal promoter regions may be a relatively small number of nucleotides, fairly close to the TSS such as (-253 to -54) or several regions of different lengths, many nucleotides away, such as (-2732 to -2600) and (-2830 to -2800).

The "[d]istal promoter is not a spacer element."

Using an estimate of 2 knts, a distal promoter to A1BG would be expected after nucleotide number 2460.

Any transcription factor before A1BG from the direction of ZN497 may be out to 2300 nts.

Samplings
Once you've decided on an entity, source, or object, compose a method, way, or procedure to explore it.

One way is to perceive (see, feel, hear, taste, or touch, for example) if there are more than one of them.

Ask some questions about it.

Does it appear to have a spatial extent?

Is there any change over time?

Can it be profiled with a kind of spectrum for example, by emitted radiation? Sample by plotting two or more apparent variables against each other, like intensity versus wavelength.

Is there some location, time, intensity, where there isn't one?

Regarding hypotheses 1:

A1BG has four possible transcription directions:
 * 1) on the negative strand from ZSCAN22 to A1BG,
 * 2) on the positive strand from ZSCAN22 to A1BG,
 * 3) on the negative strand from ZNF497 to A1BG, and
 * 4) on the positive strand from ZNF497 to A1BG.

For each transcription promoter that interacts directly with RNA polymerase II holoenzyme, the four possible consensus sequences need to be tested on the four possible transcription directions, even though some genes may only be transcribed from the negative strand in the 3'-direction on the transcribed strand.

For the Basic programs (starting with SuccessablesCArG.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
 * 1) negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesCArG--.bas, looking for 3'-CC(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)GG-5', 0,
 * 2) negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesCArG-+.bas, looking for 3'-CC(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)GG-5', 0,
 * 3) positive strand in the negative direction is SuccessablesCArG+-.bas, looking for 3'-CC(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)GG-5', 0,
 * 4) positive strand in the positive direction is SuccessablesCArG++.bas, looking for 3'-CC(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)GG-5', 0,
 * 5) complement, negative strand, negative direction is SuccessablesCArGc--.bas, looking for 3'-GG(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)CC-5', 0,
 * 6) complement, negative strand, positive direction is SuccessablesCArGc-+.bas, looking for 3'-GG(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)CC-5', 0,
 * 7) complement, positive strand, negative direction is SuccessablesCArGc+-.bas, looking for 3'-GG(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)CC-5', 0,
 * 8) complement, positive strand, positive direction is SuccessablesCArGc++.bas, looking for 3'-GG(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)CC-5', 0,
 * 9) inverse complement, negative strand, negative direction is SuccessablesCArGci--.bas, looking for 3'-CC(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)GG-5', 0,
 * 10) inverse complement, negative strand, positive direction is SuccessablesCArGci-+.bas, looking for 3'-CC(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)GG-5', 0,
 * 11) inverse complement, positive strand, negative direction is SuccessablesCArGci+-.bas, looking for 3'-CC(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)GG-5', 0,
 * 12) inverse complement, positive strand, positive direction is SuccessablesCArGci++.bas, looking for 3'-CC(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)GG-5', 0,
 * 13) inverse, negative strand, negative direction, is SuccessablesCArGi--.bas, looking for 3'-GG(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)CC-5', 0,
 * 14) inverse, negative strand, positive direction, is SuccessablesCArGi-+.bas, looking for 3'-GG(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)CC-5', 0,
 * 15) inverse, positive strand, negative direction, is SuccessablesCArGi+-.bas, looking for 3'-GG(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)CC-5', 0,
 * 16) inverse, positive strand, positive direction, is SuccessablesCArGi++.bas, looking for 3'-GG(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)CC-5', 0.

Verifications
To verify that your sampling has explored something, you may need a control group. Perhaps where, when, or without your entity, source, or object may serve.

Another verifier is reproducibility. Can you replicate something about your entity in your laboratory more than 3 times. Five times is usually a beginning number to provide statistics (data) about it.

For an apparent one time or perception event, document or record as much information coincident as possible. Was there a butterfly nearby?

Has anyone else perceived the entity and recorded something about it?

Gene ID: 1, includes the nucleotides between neighboring genes and A1BG. These nucleotides can be loaded into files from either gene toward A1BG, and from template and coding strands. These nucleotide sequences can be found in Gene transcriptions/A1BG. Copying the above discovered AGC boxes and putting the sequences in "⌘F" locates these two sequences in the same nucleotide positions as found by the computer programs.

Attempts to find various combinations of CCAAAAAAGG or GGAAAAAACC with or without some Ts using "⌘F" failed to find even one CArG box confirming the computer program.

The consensus sequence of CC(A/T)6GG is confirmed.

"MADS-box proteins bind to a consensus sequence, the CArG box, that has the core motif CC(A/T)6GG (15)."

"Of the [Flowering Locus C] FLC binding sites, 69% contained at least one CArG-box motif with the core consensus sequence CCAAAAAT(G/A)G and an AAA extension at the 3′ end [...]."

Three "other MADS-box flowering-time regulators, SOC1, SVP, and AGAMOUS-LIKE 24 (AGL24), bind to two different CArG-box motifs at 502 bp (CTAAATATGG) and 287 bp (CAATAATTGG) upstream of the translation start in the SEP3 gene (24), consistent with different specificities for the different MADS-box proteins." These together with the core motif CC(A/T)6GG (15) suggest a more general CArG-box motif of (C(C/A/T)(A/T)6(A/G)G).

A "⌘F" manual search found one more general CArG-box motif of (CAAAAAAAAG) between A1BG and ZSCAN22.

Testing the more general 3'-C(C/A/T)(A/T)6(A/G)G-5':
 * 1) negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesCArG--.bas, looking for 3'-C(C/A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G)G-5', 0,
 * 2) negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesCArG-+.bas, looking for 3'-C(C/A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G)G-5', 0,
 * 3) positive strand in the negative direction is SuccessablesCArG+-.bas, looking for 3'-C(C/A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G)G-5', 2, 3'-CAAAAAAAAG-5', 1399, 3'-CATTAAAAGG-5', 3441,
 * 4) positive strand in the positive direction is SuccessablesCArG++.bas, looking for 3'-C(C/A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G)G-5', 0,
 * 5) complement, negative strand, negative direction is SuccessablesCArGc--.bas, looking for 3'-G(A/G/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/T)C-5', 2, 3'-GTTTTTTTTC-5', 1399, 3'-GTAATTTTCC-5', 3441,
 * 6) complement, negative strand, positive direction is SuccessablesCArGc-+.bas, looking for 3'-G(A/G/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/T)C-5', 0,
 * 7) complement, positive strand, negative direction is SuccessablesCArGc+-.bas, looking for 3'-G(A/G/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/T)C-5', 0,
 * 8) complement, positive strand, positive direction is SuccessablesCArGc++.bas, looking for 3'-G(A/G/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/T)C-5', 0,
 * 9) inverse complement, negative strand, negative direction is SuccessablesCArGci--.bas, looking for 3'-C(C/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G/T)G-5', 0,
 * 10) inverse complement, negative strand, positive direction is SuccessablesCArGci-+.bas, looking for 3'-C(C/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G/T)G-5', 0,
 * 11) inverse complement, positive strand, negative direction is SuccessablesCArGci+-.bas, looking for 3'-C(C/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G/T)G-5', 0,
 * 12) inverse complement, positive strand, positive direction is SuccessablesCArGci++.bas, looking for 3'-C(C/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G/T)G-5', 0,
 * 13) inverse, negative strand, negative direction, is SuccessablesCArGi--.bas, looking for 3'-G(A/G)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/A/T)C-5', 0,
 * 14) inverse, negative strand, positive direction, is SuccessablesCArGi-+.bas, looking for 3'-G(A/G)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/A/T)C-5', 0,
 * 15) inverse, positive strand, negative direction, is SuccessablesCArGi+-.bas, looking for 3'-G(A/G)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/A/T)C-5', 0,
 * 16) inverse, positive strand, positive direction, is SuccessablesCArGi++.bas, looking for 3'-G(A/G)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/A/T)C-5', 0.

Core promoters CArGs
From the first nucleotide just after ZSCAN22 to the first nucleotide just before A1BG are 4460 nucleotides. The core promoter on this side of A1BG extends from approximately 4425 to the possible transcription start site at nucleotide number 4460.

The computer programs have detected no CArG boxes between ZSCAN22 and A1BG in the core promoter.

From the first nucleotide just after ZNF497 to the first nucleotide just before A1BG are 858 nucleotides. The core promoter on this side of A1BG extends from approximately 824 to the possible transcription start site at nucleotide number 858. Nucleotides (nts) have been added from ZNF497 to A1BG. The TSS for A1BG is now at 4300 nts from just on the other side of ZNF497. The core promoter should now be from 4266 to 4300.

The computer programs have detected no CArG boxes between ZNF497 and A1BG in the core promoter.

Proximal promoter CArGs
The proximal promoter begins about nucleotide number 4210 in the negative direction.

The computer programs have detected no CArG boxes between ZSCAN22 and A1BG in the proximal promoter.

The proximal promoter begins about nucleotide number 4195 in the positive direction.

The computer programs have detected no CArG boxes between ZNF497 and A1BG in the proximal promoter.

Distal promoter CArGs
Using an estimate of 2 knts, a distal promoter to A1BG would be expected after nucleotide number 2460.

There is a more general CArG box, 3'-GTAATTTTCC-5', at 3441 from ZSCAN22, or -1019 nts from the TSS of A1BG.

A second more general CArG box, 3'-GTTTTTTTTC-5', at 1399 from ZSCAN22, or -3061 nts from the A1BG TSS may be a CArG box for ZSCAN22 in the positive direction on the negative strand.

The computer programs have detected no CArG boxes between ZNF497 and A1BG in the proximal promoter.

Transcribed CArG boxes
"Exposure of human HL-525 cells to x-rays was associated with increases in EGRI mRNA levels. Nuclear run-on assays showed that this effect is related at least in part to activation of EGRI gene transcription. Sequences responsive to ionizing radiation-induced signals were determined by deletion analysis of the EGRI promoter. The results demonstrate that x-ray inducibility of the EGRI gene is conferred by a region containing six serum response or CC(A+T-rich)6GG (CArG) motifs. Further analysis confirmed that the region encompassing the three distal or upstream CArG elements is functional in the x-ray response. Moreover, this region conferred x-ray inducibility to a minimal thymidine kinase gene promoter. Taken together, these results indicate that ionizing radiation induces EGRI transcription through CArG elements."

"Positively acting, rate-limiting regulatory factors that influence tissue-specific expression of the human cardiac α-actin gene in a mouse muscle cell line are shown by in vivo competition and gel mobility-shift assays to bind to upstream regions of its promoter but to neither vector DNA nor a β-globin promoter. Although the two binding regions are distinctly separated, each corresponds to a cis region required for muscle-specific transcriptional stimulation, and each contains a core CC(A+T-rich)6GG sequence (designated CArG box), which is found in the promoter regions of several muscle-associated genes. Each site has an apparently different binding affinity for trans-acting factors, which may explain the different transcriptional stimulation activities of the two cis regions. [The] two CArG box regions are responsible for muscle-specific transcriptional activity of the cardiac α-actin gene through a mechanism that involves their binding of a positive trans-acting factor in muscle cells."

"SRF binds to an A/T-rich sequence (CCWWWWWWGG) that has been designated as the CArG box.10–12 CArG boxes were originally identified in transcriptional regulatory elements controlling expression of a set of growth- or serum-responsive genes including c-fos and egr-1.13,14 Subsequently, CArG boxes were identified in transcriptional regulatory elements controlling expression of a subset of genes encoding myogenic contractile and cytoskeletal proteins including α-cardiac actin, smooth muscle (SM)-α-actin, α-skeletal actin, and SM22α.15–19"

"Functionally important CArG boxes have been identified in transcriptional regulatory elements controlling expression of sets of myogenic contractile and cytoskeletal proteins (reviewed elsewhere8,25). Of note, in cardiac and skeletal muscle cells, functionally important CArG boxes have been identified in transcriptional regulatory element controlling a relatively limited subset of myofibrillar proteins.26"

"In the nucleus, MRTFs physically associate with SRF, facilitating the binding of SRF to single or dual CArG boxes, activating transcription of genes encoding cytoskeletal and myogenic proteins [...].39,40,53,55,56"

"The binding of SRF to SMC CArG boxes is associated with specific alterations in chromatin structure including the methylation and acetylation of histones.76,79"

"Both PDGF-BB and KLF-4 inhibit SRF binding to CArG boxes downregulating transcription of SMC contractile genes.92"

Laboratory reports
Below is an outline for sections of a report, paper, manuscript, log book entry, or lab book entry. You may create your own, of course.

Gene A1BG transcription using a CArG box

by --Marshallsumter (discuss • contribs) 16:21, 20 September 2017 (UTC)

Abstract
There are two hypotheses that have been examined: (1) A1BG is not transcribed by a CArG box and (2) the lack of a CArG box on either side of A1BG does not prove that it is not actively used to transcribe A1BG. By combining a literature search with computer analysis of each promoter between ZSCAN22 and A1BG and ZNF497 and A1BG, CArG boxes have been found. To show that these CArG boxes may be used during or for transcription of A1BG at least one transcription factor has been affirmed.

Introduction
According to one source, A1BG is transcribed from the direction of ZNF497: 3' - 58864890: CGAGCCACCCCACCGCCCTCCCTTGG+1GGCCTCATTGCTGCAGACGCTCACCCCAGACACTCACTGCACCGGAGTGAGCGCGACCATCATG : 58866601-5', where the second 'G' at left of four Gs in a row is the TSS. Transcription was triggered in cell cultures and the transcription start site was found using reverse transcriptase. But, the mechanism for transcription is unknown.

Controlling the transcription of A1BG may have significant immune function against snake envenomation. A1BG forms a complex that is similar to those formed between toxins from snake venom and A1BG-like plasma proteins. These inhibit the toxic effect of snake venom metalloproteinases or myotoxins and protect the animal from envenomation.

Many transcription factors (TFs) occur upstream and occasionally downstream of the transcription start site (TSS), in a gene's promoter. It isn't known which, if any, assist in locating and affixing the transcription mechanism for A1BG. This examination is the first to test one such DNA-occurring TF: the CArG box.

Experiment
To test for the existence of a CArG box, its usual consensus sequence: CC(A/T)6GG has been looked for using computer programs that work through the nucleotide sequences between ZSCAN22 and A1BG and between ZNF497 and A1BG.

The second hypothesis: (2) the lack of a CArG box on either side of A1BG does not prove that it is not actively used to transcribe A1BG, appears to be self-conflicting. To test this hypothesis an additional literature search was performed looking for more recent results on the CArG box.

Results
No CArG boxes of consensus sequence CC(A/T)6GG were found among the 4460 nts between ZSCAN22 and A1BG nor among the 858 nts between ZNF497 and A1BG, although nucleotides within or beyond were not tested except for those between 858 and 958, where also no CArG boxes were found. This confirms hypothesis (1) that A1BG is not transcribed by a CArG box.

A literature search of more recent results discovered: "Of the [Flowering Locus C] FLC binding sites, 69% contained at least one CArG-box motif with the core consensus sequence CCAAAAAT(G/A)G and an AAA extension at the 3′ end [. Three] other MADS-box flowering-time regulators, SOC1, SVP, and AGAMOUS-LIKE 24 (AGL24), bind to two different CArG-box motifs at 502 bp (CTAAATATGG) and 287 bp (CAATAATTGG) upstream of the translation start in the SEP3 gene (24), consistent with different specificities for the different MADS-box proteins."

These together with the core motif CC(A/T)6GG suggest a more general CArG-box motif of (C(C/A/T)(A/T)6(A/G)G). Subsequent computer-program testing revealed two more general CArG boxes: 3'-CAAAAAAAAG-5' at 1399 nts from ZSCAN22 and 3'-CATTAAAAGG-5' at 3441 nts from ZSCAN22, but none within 958 nts toward A1BG from ZNF497.

Discussion
These results show that the presence of CArG boxes on the ZSCAN22 side of A1BG implies their use when transcribing A1BG, although they may be pointing toward ZSCAN22. These suggest that hypothesis (1) is false. Regarding hypothesis (2), the presence of more general CArG boxes in the distal promoter tentatively confirms this hypothesis.

No experimental efforts to force transcription of A1BG from the either side were performed, nor were the CArG boxes demonstrated to be used.

A complete description of all the transcription factors that can use a CArG box to enhance, inhibit or activate transcription is needed.

Conclusion
CArG boxes do occur in the distal promoter of A1BG. And, it is likely that a CArG box is involved in some way with the transcription of A1BG.

Laboratory evaluations
To assess your example, including your justification, analysis and discussion, I will provide such an assessment of my example for comparison and consideration.

Evaluation

No wet chemistry experiments were performed to confirm that Gene ID: 1 is transcribed from either side using CArG boxes. The NCBI database is generalized, whereas individual human genome testing could demonstrate that A1BG is transcribed from either side, or at least from the ZSCAN22 side.