Gene transcriptions/Boxes/E2s/Laboratory

A laboratory is a specialized activity, a construct, you create where you as a student, teacher, or researcher can have hands-on, or as close to hands-on as possible, experience actively analyzing an entity, source, or object of interest. Usually, there's more to do than just analyzing. The construct is often a room, building or institution equipped for scientific research, experimentation as well as analysis.

This laboratory is a continuation of the previous laboratory.

In the room next door is an astronaut on the Mars expedition, three months along on the six-month journey. A physician and lab assistants have been performing tests on her. Because she has been in zero gravity for more than three months her body chemistry and anatomy now differ from what it was in the controlled gravity environment of Earth. She has lost about 10 % each of her bone, muscle, and brain mass. Comparisons with gene expression sequences now and when on Earth have found that the gene expression for alpha-1-B glycoprotein is not normal. If a way to correct this expression cannot be found she must be returned to Earth maybe to recover, maybe not!

But, it is unlikely she will survive three more months at zero g either to be returned to Earth or put on Mars. Worse, the microgravity may not be the only culprit. There is also the radiation of the interplanetary medium.

You have been tasked to examine her DNA to confirm, especially with the extended data between ZNF497 and A1BG, the presence or absence of E2 boxes regarding the possible expression of alpha-1-B glycoprotein.

"The E box [ enhancer box ] sites that are most important are those of the E2 box class (GCAGXTGG/T). Two E2 box sites are present in the immunoglobulin heavy chain gene enhancer [...] and one is present in the kappa enhancer, designated KE2 [29-31]."

Consensus sequences
"The most dramatic impact on immunoglobulin gene enhancer activity was observed upon mutation of sites that contain an E2-box motif (G/ACAGNTGN)."

Nucleotides
DNA mapping has been performed. Her DNA for A1BG promoters can be found at Gene_transcriptions/A1BG.

Programming
Sample programs for preparing test programs are available at Gene transcriptions/A1BG/Programming.

Hypotheses

 * 1) E2 boxes are not present in the promoter of A1BG.
 * 2) If an E2 box is present it does not assist in the transcription of A1BG.

Core promoters
The core promoter is approximately -34 nts upstream from the TSS.

From the first nucleotide just after ZSCAN22 to the first nucleotide just before A1BG are 4460 nucleotides. The core promoter on this side of A1BG extends from approximately 4425 to the possible transcription start site at nucleotide number 4460.

To extend the analysis from inside and just on the other side of ZNF497 some 3340 nts have been added to the data. This would place the core promoter some 3340 nts further away from the other side of ZNF497. The TSS would be at about 4300 nts with the core promoter starting at 4266.

Def. "the factors, including RNA polymerase II itself, that are minimally essential for transcription in vitro from an isolated core promoter" is called the basal machinery, or basal transcription machinery.

Proximal promoters
Def. a "promoter region [juxtaposed to the core promoter that] binds transcription factors that modify the affinity of the core promoter for RNA polymerase.[12][13]" is called a proximal promoter.

The proximal sequence upstream of the gene that tends to contain primary regulatory elements is a proximal promoter.

It is approximately 250 base pairs or nucleotides, nts, upstream of the transcription start site.

The proximal promoter begins about nucleotide number 4210 in the negative direction.

The proximal promoter begins about nucleotide number 4195 in the positive direction.

Distal promoters
The "upstream regions of the human [cytochrome P450 family 11 subfamily A] CYP11A and bovine CYP11B genes [have] a distal promoter in each gene. The distal promoters are located at −1.8 to −1.5 kb in the upstream region of the CYP11A gene and −1.5 to −1.1 kb in the upstream region of the CYP11B gene."

"Using cloned chicken βA-globin genes, either individually or within the natural chromosomal locus, enhancer-dependent transcription is achieved in vitro at a distance of 2 kb with developmentally staged erythroid extracts. This occurs by promoter derepression and is critically dependent upon DNA topology. In the presence of the enhancer, genes must exist in a supercoiled conformation to be actively transcribed, whereas relaxed or linear templates are inactive. Distal protein–protein interactions in vitro may be favored on supercoiled DNA because of topological constraints."

Distal promoter regions may be a relatively small number of nucleotides, fairly close to the TSS such as (-253 to -54) or several regions of different lengths, many nucleotides away, such as (-2732 to -2600) and (-2830 to -2800).

The "[d]istal promoter is not a spacer element."

Using an estimate of 2 knts, a distal promoter to A1BG would be expected after nucleotide number 2460.

Any transcription factor before A1BG from the direction of ZN497 may be out to 2300 nts.

E2 boxes
For the Basic programs (starting with SuccessablesE2box.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
 * 1) negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesE2box--.bas, looking for 3'-(G/A)CAG(A/C/G/T)TG(A/C/G/T)-5', 5, 3'-ACAGATGT-5', 482, 3'-ACAGATGT-5', 1225, 3'-GCAGTTGG-5', 1514, 3'-ACAGATGT-5', 2989, 3'-ACAGATGT-5', 4213,
 * 2) negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesE2box-+.bas, looking for 3'-(G/A)CAG(A/C/G/T)TG(A/C/G/T)-5', 1, 3'-GCAGATGA-5', 37,
 * 3) positive strand in the negative direction is SuccessablesE2box+-.bas, looking for 3'-(G/A)CAG(A/C/G/T)TG(A/C/G/T)-5', 2, 3'-GCAGGTGG-5', 2571, 3'-ACAGATGA-5', 3920,
 * 4) positive strand in the positive direction is SuccessablesE2box++.bas, looking for 3'-(G/A)CAG(A/C/G/T)TG(A/C/G/T)-5', 0,
 * 5) complement, negative strand, negative direction is SuccessablesE2boxc--.bas, looking for 3'-(C/T)GTC(A/C/G/T)AC(A/C/G/T)-5', 2, 3'-CGTCCACC-5', 2571, 3'-TGTCTACT-5', 3920,
 * 6) complement, negative strand, positive direction is SuccessablesE2boxc-+.bas, looking for 3'-(C/T)GTC(A/C/G/T)AC(A/C/G/T)-5', 0,
 * 7) complement, positive strand, negative direction is SuccessablesE2boxc+-.bas, looking for 3'-(C/T)GTC(A/C/G/T)AC(A/C/G/T)-5', 5, 3'-TGTCTACA-5', 482, 3'-TGTCTACA-5', 1225, 3'-CGTCAACC-5', 1514, 3'-TGTCTACA-5', 2989, 3'-TGTCTACA-5', 4213,
 * 8) complement, positive strand, positive direction is SuccessablesE2boxc++.bas, looking for 3'-(C/T)GTC(A/C/G/T)AC(A/C/G/T)-5', 1, 3'-CGTCTACT-5', 37,
 * 9) inverse complement, negative strand, negative direction is SuccessablesE2boxci--.bas, looking for 3'-(A/C/G/T)CA(A/C/G/T)CTG(C/T)-5', 1, 3'-CCACCTGT-5', 2117,
 * 10) inverse complement, negative strand, positive direction is SuccessablesE2boxci-+.bas, looking for 3'-(A/C/G/T)CA(A/C/G/T)CTG(C/T)-5', 0,
 * 11) inverse complement, positive strand, negative direction is SuccessablesE2boxci+-.bas, looking for 3'-(A/C/G/T)CA(A/C/G/T)CTG(C/T)-5', 4, 3'-CCACCTGT-5', 394, 3'-ACACCTGT-5', 1131, 3'-GCAACTGC-5', 3851, 3'-ACACCTGT-5', 3970,
 * 12) inverse complement, positive strand, positive direction is SuccessablesE2boxci++.bas, looking for 3'-(A/C/G/T)CA(A/C/G/T)CTG(C/T)-5', 0,
 * 13) inverse, negative strand, negative direction, is SuccessablesE2boxi--.bas, looking for 3'-(A/C/G/T)GT(A/C/G/T)GAC(G/A)-5', 4, 3'-GGTGGACA-5', 394, 3'-TGTGGACA-5', 1131, 3'-CGTTGACG-5', 3851, 3'-TGTGGACA-5', 3970,
 * 14) inverse, negative strand, positive direction, is SuccessablesE2boxi-+.bas, looking for 3'-(A/C/G/T)GT(A/C/G/T)GAC(G/A)-5', 0,
 * 15) inverse, positive strand, negative direction, is SuccessablesE2box+-.bas, looking for 3'-(A/C/G/T)GT(A/C/G/T)GAC(G/A)-5', 1, 3'-GGTGGACA-5', 2117,
 * 16) inverse, positive strand, positive direction, is SuccessablesE2boxi++.bas, looking for 3'-(A/C/G/T)GT(A/C/G/T)GAC(G/A)-5', 0.

Verifications
To verify that your sampling has explored something, you may need a control group. Perhaps where, when, or without your entity, source, or object may serve.

Another verifier is reproducibility. Can you replicate something about your entity in your laboratory more than 3 times. Five times is usually a beginning number to provide statistics (data) about it.

For an apparent one time or perception event, document or record as much information coincident as possible. Was there a butterfly nearby?

Has anyone else perceived the entity and recorded something about it?

Gene ID: 1, includes the nucleotides between neighboring genes and A1BG. These nucleotides can be loaded into files from either gene toward A1BG, and from template and coding strands. These nucleotide sequences can be found in Gene transcriptions/A1BG. Copying the above discovered CRE boxes and putting the sequences in "⌘F" locates these sequences in the same nucleotide positions as found by the computer programs.

Core promoters E2 boxes
From the first nucleotide just after ZSCAN22 to the first nucleotide just before A1BG are 4460 nucleotides. The core promoter on this side of A1BG extends from approximately 4425 to the possible transcription start site at nucleotide number 4460.

There are no E2 boxes in the core promoter in the negative direction.

From the first nucleotide just after ZNF497 to the first nucleotide just before A1BG are 858 nucleotides. The core promoter on this side of A1BG extends from approximately 824 to the possible transcription start site at nucleotide number 858. Nucleotides (nts) have been added from ZNF497 to A1BG. The TSS for A1BG is now at 4300 nts from just on the other side of ZNF497. The core promoter should now be from 4266 to 4300.

There are no E2 boxes in the core promoter in the positive direction.

Proximal promoter E2 boxes
The proximal promoter begins about nucleotide number 4210 in the negative direction.

There is one E2 box 3'-ACAGATGT-5' at 4213 in the negative direction.

The proximal promoter begins about nucleotide number 4195 in the positive direction.

There is no E2 box in the positive direction.

Distal promoter E2 boxes
Using an estimate of 2 knts, a distal promoter to A1BG would be expected after nucleotide number 2460 in the negative direction.

There are two E2 boxes in the distal promoter in the negative direction: 3'-ACAGATGT-5' at 2989 and 3'-ACAGATGT-5' at 4213 nts, negative strand, and two on the positive strand: 3'-GCAGGTGG-5' at 2571 and 3'-ACAGATGA-5' at 3920, plus their complements an inverses.

Using an estimate of 2 knts, a distal promoter to A1BG would be expected after nucleotide number 2300 in the positive direction.

There are none in the distal promoter in the positive direction.

Transcribed E2 boxes
A Google Scholar search using A1BG and E2 box produced no results.

B cells
"The developmental regulation of Ig gene expression is dependent on various sequences in the Ig enhancer. One class of such sequence elements is the E boxes. They share as a consensus sequence NNCANNTGNN. The E-box sites were first identified by dimethylsulfate protection experiments (6, 12). Factors were found to protect certain sequences from methylation in the Ig heavy- and light-chain enhancer in B cells but not in non-B cells (6,12). That the E-box elements are critical for B-cell-specific gene expression became evident from mutational analysis. Mutation of E-box sites caused a significant decrease in Ig transcription (18, 21). The most dramatic impact on Ig expression was found in mutations of elements that contain an E2 box (G/ACAGNTGT/G) (21). The E2 boxes are particularly interesting because they are also present in muscle-and pancreas-specific enhancers (3,4,32). Mutation of the E2-box elements present in these enhancers revealed the crucial role of these elements in regulating muscle- and pancreas-specific genes (16, 22, 26, 27, 32)."

Cadherins
"Transcriptional downregulation of E-cadherin appears to be an important event in the progression of various epithelial tumors. SIP1 (ZEB-2) is a Smad-interacting, multi-zinc finger protein that shows specific DNA binding activity. [Expression] of wild-type but not of mutated SIP1 downregulates mammalian E-cadherin transcription via binding to both conserved E2 boxes of the minimal E-cadherin promoter."

"Analysis of mouse and human E-cadherin promoters revealed a conserved modular structure with positive regulatory elements including two E2 boxes (CACCTG) with a potential repressor role Behrens et al. 1991, Giroldi et al. 1997."

"The two E2 boxes in the mouse and human E-cadherin promoter sequences were demonstrated to play a crucial role in the epithelial-specific expression of E-cadherin Behrens et al. 1991, Giroldi et al. 1997. Mutation of these sequence elements results in upregulation of the E-cadherin promoter in dedifferentiated cancer cells, whereas the wild-type promoter shows low activity in such cells. Recently, it was shown that the zinc finger transcriptional repressor Snail can downregulate E-cadherin by binding to the E boxes in the E-cadherin promoter Batlle et al. 2000, Cano et al. 2000. Human Snail belongs to a family of zinc finger proteins, which contain four or five zinc finger domains of the C2H2 type at their C-terminal end. These zinc fingers bind to the CANNTG sequence in E box motifs."

"δEF1 and SIP1 have been shown to bind spaced CACCT DNA sequences, including E2 boxes (CACCTG), by their zinc finger clusters (Remacle et al., 1999)."

"To address the specificity of SIP1 action, mutagenesis of the E-cadherin promoter in either its upstream E2 box 1 (−75) or its downstream E2 box 3 (−25), or in both E2 boxes was performed [...]."

Wild-type "SIP1 represses the E-cadherin promoter, likely through binding via both zinc finger clusters to spaced E2 boxes as demonstrated previously (Remacle et al., 1999) and confirmed here by a DNA-mediated pull-down assay of SIP1 protein [...]. Wild-type but not mutated SIP1 from transfected human cells could be efficiently precipitated by biotinylated E-cadherin promoter oligonucleotides, comprising two wild-type E2 box sequences. Mutation of the E2 boxes resulted in the loss of SIP1 binding."

Human E2 boxes are E2-box 1 (GCAGGTGA), E2-box 2 (TGGCCGGC) and E2-box 3 (TCACCTGG).

"Alignment of the E-cadherin promoter sequences of dog, mouse, and man. Conserved regulatory elements are indicated: E2 boxes 1 and 3, CCAAT box, and GC box. The E2 box 2 has been described as part of a palindromic E-pal sequence in the mouse E-cadherin promoter (Behrens et al., 1991), but is conserved neither in canine nor in human sequences."

Snails
"Snail family genes encode zinc finger-containing proteins that function primarily as transcriptional repressors [1,2]. To date, three members of the Snail gene family have been described in vertebrates: Snai1 (also known as Snail), Snai2 (Slug) and Snai3 (Smuc). Snail family proteins possess a highly conserved carboxy-terminal region, containing four or five Cys2-His2 (C2H2)-type zinc finger regions and a more divergent amino-terminus that contains the evolutionarily conserved SNAG domain. The zinc finger regions are sequence-specific DNA-binding domains that bind E2-box sequences (CAGGTG and CACCTG). Both the SNAI1 and SNAI2 proteins recruit other proteins, such as histone deacetylase-1 (HDAC-1), to the E2 boxes of target genes to form a transcriptional repression complex that suppresses the transcription of Snail target genes [3,4]."

"We searched the regions from −2500 bp to +500 bp of the Snai1 and Snai2 genes for E2 box sequences (CACCTG and CAGGTG), and identified eleven in the Snai1 promoter region [...] and five in the Snai2 promoter region [...]."

"ChIP assays demonstrated binding of the SNAI1 and SNAI2 proteins to a subset of E2 boxes in both their own and each other’s promoter regulatory regions [...]. The SNAI2 protein bound to the Snai1 promoter region at sites 4, 7 and 8 [...], whereas the SNAI1 protein bound to its own promoter region at sites 2, 3, 4, 7, and 8 [...]. Conversely, the SNAI1 protein bound to the Snai2 promoter region at site 5 [...], whereas the SNAI2 protein bound its own promoter region at site 3, 4 and 5 [...]."

Laboratory reports
Below is an outline for sections of a report, paper, manuscript, log book entry, or lab book entry. You may create your own, of course.

E2 boxes transcription laboratory

by --Marshallsumter (discuss • contribs) 11:53, 20 February 2019 (UTC)

Abstract
The E2 box is a type of enhancer box. And, like an enhancer box is expected to enhance or perhaps modulate the transcription of alpha-1-B glycoprotein. The first hypothesis tested is whether an E2 box occurs in the promoters of Gene ID: 1 alpha-1-B glycoprotein (A1BG). No E2 box was found between Gene ID: 162968 zinc finger protein 497 (ZNF497) and A1BG, the only side known to transcribe A1BG. But, E2 boxes occur between Gene ID: 342945 zinc finger and SCAN domain containing 22 (ZSCAN22) and A1BG suggesting a role in modulating transcription under specialized circumstances. Testing this as the second hypothesis demonstrated its likelihood with additional transcription factors. Wet chemistry is needed to confirm that transcription does occur from the ZSCAN22 side or that as yet unknown genes occur there.

Introduction
Many of the transcription factors examined so far could contribute to the transcription of A1BG: AGC boxes (GCC boxes), ATA boxes, C and D boxes, CArG boxes, CRE boxes, Enhancer boxes, Factor II B recognition elements (BREu), GA responsive complexes, HNF6s, HY boxes, Metal responsive elements, and STAT5s.

Transcription factors
Many transcription factors (TFs) may occur upstream and occasionally downstream of the transcription start site (TSS), in this gene's promoter. The following have been examined so far: (1) AGC boxes (GCC boxes), (2) ATA boxes, (3) CAAT boxes, (4) C and D boxes, (5) CAREs (GA responsive complexes), (6) CArG boxes, (7) CENP-B boxes, (8) CGCG boxes, (9) CRE boxes, (10) DREB boxes, (11) EIF4E basal elements (4EBEs), (12) enhancer boxes (E boxes), (13) Factor II B recognition elements, (14) GAREs (GA responsive complexes), (15) G boxes, (16) GLM boxes, (17) HNF6s, (18) HY boxes, (19) Metal responsive elements (MREs), (20) Motif ten elements (MTEs), (21) Pyrimidine boxes (GA responsive complexes), (22) STAT5s, (23) TATA boxes, (24) TAT boxes (GA responsive complexes), (25) TATCCAC boxes, (26) W boxes (GA responsive complexes), (27) X boxes and (28) Y boxes.

AGC boxes (GCC boxes)
An AGC box was found in the distal promoter of either gene ZSCAN22 or A1BG on both the template and coding strands. But, as the only known transcription of A1BG occurs between Gene ID: 162968 ZNF497 and Gene ID: 1 A1BG, it is unlikely that this AGC box is naturally used to transcribe A1BG.

A full web search produced several references including a GeneCard for "zinc finger protein 497" and "GCC box", including "May be involved in transcriptional regulation." Zinc fingers are mentioned in association with GCC boxes in plants. It seems unlikely that an AGC box is involved in any way with the transcription of A1BG.

An extension of the nucleotide data for the positive direction from ZNF475 toward A1BG from 958 nts to 4445 nts has not discovered any AGC boxes even in the distal promoter just beyond ZNF497.

ATA boxes
Regarding hypothesis 1: there are no ATA boxes in the core promoter of A1BG from either direction or strand. This hypothesis has been shown to be true. A corollary hypothesis might be 1.1: there are no ATA boxes in the proximal promoter of A1BG from either direction or strand. This corollary hypothesis may be true. "The analysis of the promoter region indicated that a putative ATA box is located 54 nucleotides upstream from the transcription start site". There is one inverse and inverse complement ATA box in the proximal promoter in the positive direction between 4050 and 4300: 3'-AAATAA-5' at 4142, and 3'-TTTATT-5' at 4142. As the TSS is at 4300 nts, this ATA box is some 158 nts away, where with the smaller data set 3'-TTTATT-5' was at 703. As the TSS is at 858 nts, this ATA box is some 155 nts away, which is approximately the same number of nts from the TSS but not close enough to be in the core promoter and not 54 nts upstream from the TSS or to match other such genes with an ATA box.

But the ATA box at 2347 is likely involved in transcription of A1BG in analogy to the rat. Although this has not been confirmed as involved, the existence of this ATA box likely proves hypothesis 1 false.

Regarding hypothesis 2: ATA boxes have a role as downstream signal transducers in A1BG. There is the following inverse ATA box on the negative strand, negative direction: 3'-AAATAA-5' at 4537. On this strand, in this direction the TSS is at 4460 nts from ZSCAN22. This ATA box is 77 nts downstream. So far no published research has been found to verify this type of downstream promoter or enhancer ATA box. There may be another isoform TSS nearby. As such, hypothesis 2 may be true.

Regarding hypothesis 3: ATA boxes may assist transcription of A1BG by other transcription factors. This hypothesis has been shown by literature search to be true. But, none of the ATA boxes for A1BG are close enough to any STAT5 promoter to match known transcription initiation.

CAAT boxes
No CAAT boxes occur on either side of A1BG.

C and D boxes
Regarding hypothesis 1: The C and D boxes are not involved in the transcription of A1BG.

There are no C boxes or D boxes in the core promoter from approximately 4425 to the possible transcription start site at nucleotide number 4460.

There are no C boxes or D boxes in the core promoter from approximately 4266 to the possible transcription start site at nucleotide number 4300.

There are no C boxes or D boxes in the proximal promoter beginning about nucleotide number 4210 in the negative direction.

There is one C box 3'-ACATCA-5' at 4116 but no D boxes in the proximal promoter beginning about nucleotide number 4050 in the positive direction.

There are four C boxes in the distal promoter: 3'-AGTAGT-5' at 2888, 3'-AGTAGT-5' at 2944, 3'-AGTAGT-5' at 3418, and 3'-AGTAGT-5' at 3521 on the negative strand in the negative direction and its complement on the positive strand.

There is one D box in the distal promoter: 3'-AGTCTG-5' at 2947 on the negative strand in the negative direction and its complement on the positive strand.

There is one C box in the distal promoter: 3'-TCATCA-5' at 3251 on the negative strand in the positive direction and its complement on the positive strand.

There is one D box in the distal promoter: 3'-AGTCTG-5' at 3923 on the negative strand in the positive direction and its complement on the positive strand.

Regarding hypothesis 2: If involved they assist transcription by other TFs.

A Google scholar search using key words: "C box", "D box", and A1BG produced zero results.

Regarding hypothesis 3: C and D boxes occur only in the proximal promoter.

GeneID: 60674 GAS5 growth arrest specific 5 (non-protein coding). "This gene produces a spliced long non-coding RNA and is a member of the 5' terminal oligo-pyrimidine class of genes. It is a small nucleolar RNA host gene, containing multiple C/D box snoRNA genes in its introns. Part of the secondary RNA structure of the encoded transcript mimics glucocorticoid response element (GRE) which means it can bind to the DNA binding domain of the glucocorticoid receptor (nuclear receptor subfamily 3, group C, member 1). This action blocks the glucocorticoid receptor from being activated and thereby stops it from regulating the transcription of its target genes. This transcript is also thought to regulate the transcriptional activity of other receptors, such as androgen, progesterone and mineralocorticoid receptors, that can bind to its GRE mimic region. Multiple functions have been associated with this transcript, including cellular growth arrest and apoptosis. It has also been identified as a potential tumor suppressor, with its down-regulation associated with cancer in multiple different tissues."

"The antisense elements located immediately upstream of the D box and/or the D′ box match the sequence of the target RNA, while the areas immediately upstream of the C box and immediately downstream of the D box form a 5′–3′ terminal stem".

"Small nucleolar RNAs (snoRNAs) are noncoding RNAs involved in the processing and modification of ribosomal RNAs. They are grouped in two distinct families, the box C/D family, which catalyzes methylation of 2′-hydroxyls of the pre-rRNA precursor, and the box H/ACA family, which catalyzes the modification of uridines into pseudouridines in various RNAs (reviewed in Refs. [24] and [40])."

"Small nucleolar RNAs (snoRNAs) are 60–300-nucleotide-long RNAs located in the nucleolus or in Cajal bodies. They constitute one of the most abundant classes of ncRNAs [9]. Predominantly intronic, 300 different snoRNA sequences are located in the human genome. They are classified into two categories, those containing boxes C and D; and, those containing boxes H and ACA. snoRNAs are generated after splicing, debranching, and trimming of mRNA introns. Subsequently, mature snoRNAs associate with proteins to form small nucleolar ribonucleoproteins (snoRNPs). These complexes are exported into the nucleolus to participate in rRNA processing [5]."

Tiny "RNAs with a modal length of 18 nt [...] map within -60 to +120 nt of transcription start sites (TSSs) in human, chicken and Drosophila. These transcription initiation RNAs (tiRNAs) are derived from sequences on the same strand as the TSS and are preferentially associated with G+C-rich promoters. The 5' ends of tiRNAs show peak density 10-30 nt downstream of TSSs, indicating that they are processed. tiRNAs are generally, although not exclusively, associated with highly expressed transcripts and sites of RNA polymerase II binding."

"With exception of U3 all box C/D snoRNAs presented in this study are intron-encoded, as it is the general pathway for the biogenesis of this class of snoRNAs (22)."

"Box C/D snoRNAs [...] contain conserved Box C (UGAUGA) and Box D (CUGA) elements located closely to the 5′- and 3′-ends, respectively. Internal copies of these elements are termed Box C′ and Box D′ (20,21)."

Gene ID: 7422 VEGFA vascular endothelial growth factor A. "This gene is a member of the PDGF/VEGF growth factor family. It encodes a heparin-binding protein, which exists as a disulfide-linked homodimer. This growth factor induces proliferation and migration of vascular endothelial cells, and is essential for both physiological and pathological angiogenesis. Disruption of this gene in mice resulted in abnormal embryonic blood vessel formation. This gene is upregulated in many known tumors and its expression is correlated with tumor stage and progression. Elevated levels of this protein are found in patients with POEMS syndrome, also known as Crow-Fukase syndrome. Allelic variants of this gene have been associated with microvascular complications of diabetes 1 (MVCD1) and atherosclerosis. Alternatively spliced transcript variants encoding different isoforms have been described. There is also evidence for alternative translation initiation from upstream non-AUG (CUG) codons resulting in additional isoforms. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is antiangiogenic. Expression of some isoforms derived from the AUG start codon is regulated by a small upstream open reading frame, which is located within an internal ribosome entry site."

CAREs
Inverse CARE occur 3'-CTCAAC-5' at 1406, 3'-CTCAAC-5' at 2592, 3'-CTCAAC-5' at 2704, 3'-CTCAAC-5' at 3115, and 3'-CTCAAC-5' at 4096 in the negative direction.

A CARE occurs 3'-CAACTC-5' at 3292 in the positive direction. But inverse CARE occur 3'-CTCAAC-5' at 1406, 3'-CTCAAC-5' at 1621 and 3'-CTCAAC-5' at 3290.

CArG boxes
By combining a literature search with computer analysis of each promoter between ZSCAN22 and A1BG and ZNF497 and A1BG, CArG boxes have been found. To show that these CArG boxes may be used during or for transcription of A1BG at least one transcription factor has been affirmed.

A literature search of more recent results discovered: "Of the [Flowering Locus C] FLC binding sites, 69% contained at least one CArG-box motif with the core consensus sequence CCAAAAAT(G/A)G and an AAA extension at the 3′ end [. Three] other MADS-box flowering-time regulators, SOC1, SVP, and AGAMOUS-LIKE 24 (AGL24), bind to two different CArG-box motifs at 502 bp (CTAAATATGG) and 287 bp (CAATAATTGG) upstream of the translation start in the SEP3 gene (24), consistent with different specificities for the different MADS-box proteins."

These together with the core motif CC(A/T)6GG suggest a more general CArG-box motif of (C(C/A/T)(A/T)6(A/G)G). Subsequent computer-program testing revealed two more general CArG boxes: 3'-CAAAAAAAAG-5' at 1399 nts from ZSCAN22 and 3'-CATTAAAAGG-5' at 3441 nts from ZSCAN22, but none within 4300 nts toward A1BG from ZNF497.

These results show that the presence of CArG boxes on the ZSCAN22 side of A1BG implies their use when transcribing A1BG, although they may be pointing toward ZSCAN22. These suggest that the hypothesis (A1BG is not transcribed by a CArG box) is false. Regarding the second hypothesis (The lack of a CArG box on either side of A1BG does not prove that it is not actively used to transcribe A1BG), the presence of more general CArG boxes in the distal promoter tentatively confirms this hypothesis.

CArG boxes do occur in the distal promoter of A1BG on the ZSCAN22 side only. And, it is likely that a CArG box is involved in some way with the transcription of A1BG.

CENP-B boxes
No CENP-B boxes occur on either side of A1BG.

CGCG boxes
On the negative strand in the negative direction (from ZSCAN22 to A1BG), looking for 3'-(A/C/G)CGCG(C/G/T)-5', there no CGCG boxes in the core promoter.

On the negative strand in the positive direction (from ZNF497 to A1BG), looking for 3'-(A/C/G)CGCG(C/G/T)-5', there no CGCG boxes in the core promoter.

There are no CGCG boxes in the negative direction of the proximal promoter.

There are no CGCG boxes in the positive direction of the other proximal promoter.

There are no CGCG boxes after nucleotide number 2460 in the negative direction of the distal promoter.

There are no CGCG boxes after nucleotide number 2300 in the positive direction of the other distal promoter.

All of the CGCG boxes found are more closely associated with ZSCAN22 or ZNF497 than A1BG.

CRE boxes
There is one CRE box on the negative strand pointing toward A1BG in the proximal promoter in the negative direction between A1BG and ZSCAN22: 3'-TGACGTCA-5' 4317 nts, that can be involved in the transcription of A1BG probably with an Inr rather than a TATA box. This tentatively proves hypothesis 1 false; i.e., A1BG can be transcribed by a CRE box.

DREB boxes
No DREB boxes occur on either side of A1BG.

EIF4E basal elements
No EIF4E basal elements, also eIF4E or (4EBE), occur on either side of A1BG.

Enhancer boxes
The presence of many enhancer boxes on both sides of A1BG demonstrate that the hypothesis: "A1BG is not transcribed by an enhancer box", is false.

The finding by literature search of evidence verifying that at least one transcription factor can enhance or inhibit the transcription of A1BG using one or more enhancer boxes disproves the hypothesis: "Existence of an enhancer box on either side of A1BG does not prove that it is actively used to transcribe A1BG".

Enhancer boxes do occur in the proximal and distal promoters of A1BG. And, it is likely that an enhancer box is involved in some way with the transcription of A1BG.

Factor II B recognition elements
Regarding hypothesis 1: B recognition element (BREu) is not involved in the transcription of A1BG.

In the negative direction, there are no BREs (BREu) in the core promoter from approximately 4425 to the possible transcription start site at nucleotide number 4460.

In the positive direction, there are no BREs (BREu) in the core promoter from approximately 4266 to the possible transcription start site at nucleotide number 4300.

There are no BREs (BREu) in the proximal promoter beginning about nucleotide number 4210 in the negative direction.

There are no BREs (BREu) in the proximal promoter beginning about nucleotide number 4050 in the positive direction.

There is one BREu in the distal promoter: 3'-CCGCACC-5' at 3047 on the negative strand in the negative direction and its complement on the positive strand.

There is one BREu in the distal promoter: 3'-CCGCACC-5' at 2566 on the negative strand in the positive direction and its complement on the positive strand.

Regarding hypothesis 2: If involved it assists transcription by other TFs.

A search of Google Scholar and the full web failed to produce any examples of BREu assisted A1BG transcription.

"A computational study based on statistical analysis of curated promoter sets concluded that up to 25% of human core promoters contain a potential BREu. The motif was found to be enriched in CpG promoters (>30% frequency) but depleted in CpG-less promoters (<10% frequency) [14]."

GAREs
An inverse GARE: 3'-AAACAAT-5' and its complement at 230 nts occur close to ZSCAN22, likely way outside the distal promoters, so no GARE occur on either side of A1BG in the distal promoters.

G boxes
No G boxes occur on either side of A1BG.

GLM boxes
No GLM boxes occur on either side of A1BG.

HNF6s
HNF6s may have a downstream proximal promoter element if the computer nts sampling is additionally, approximately at least 250 nts downstream of the transcription start site. "Downstream" can refer to downstream from an enhancer but before the transcription start site, downstream from a TATA box or an initiator element but before the transcription start site (TSS), downstream from another promoter element and containing the TSS, or downstream after the TSS. The computer programs written to test for HNF6 promoters were limited to 100 nts below the apparent TSSs.

There is a HNF6 on the negative strand in the positive direction (from ZNF497 to A1BG) of 3'-TTCCGGGAA-5' at 808 in the proximal promoter, where the TSS is at 858 nts from ZNF497.

There is no such "downstream" promoter between ZSCAN22 and A1BG.

Both a TATA box or an Inr are within the core promoter. There are no HNF6s within any core promoters per the computer program sampling from ZNF497 or ZSCAN22 and A1BG.

There are no HNF6s within any core promoters per the computer program sampling from ZNF497 or ZSCAN22 and A1BG containing either TSS.

No HNF6s were detected at least to 100 nts downstream of each TSS.

There is a HNF6 on the negative strand in the positive direction (from ZNF497 to A1BG) of 3'-TTCCGGGAA-5' at 808 in the proximal promoter, where the TSS is at 858 nts from ZNF497. This direction is the only confirmed transcription of A1BG; therefore, it is likely A1BG is transcribed using this HNF6 transcription factor.

There are two HNF6s on the negative strand in the negative direction, 3'-AAGCAACTT-5' at 3506 and 3'-AAGGGACTT-5' at 3782. Both of these are in the distal promoter between ZSCAN22 and A1BG.

The only known TSS for A1BG lies at 4300 nts from just beyond ZNF497 toward A1BG. There two HNF6s in the proximal promoter between 4050 and 4300, 3'-TTATTGATTA-5' at 4164 and 3'-TATAATTGTT-5' at 4172, i.e. outside from 4242 (-58) to 4250 (-50). This suggests that HNF6 assists in the transcription of A1BG, but not downstream of the TSS.



Both "the 2.3 kb and the 160 bp proximal parts of the a1bg promoter direct sex-specific expression of the reporter gene, and that a negative regulatory element resides in the −1 kb to −160 bp region."

"Computer analysis of the 2.3 kb rat a1bg promoter fragment revealed two putative HNF6 sites and one [hepatic nuclear factor 6] HNF6/HNF3 binding site at −2077/−2069, −69/−61 and −137/−128 respectively [...]."

The "GH-dependent sexually dimorphic expression conveyed by the 2.3 kb a1bg promoter is enhanced by the HNF6/HNF3 site [...]."

"HNF6 bound to the a1bg HNF6 oligonucleotide, but in this case, the mutated oligonucleotide was able to compete for binding when added in large excess [...]. However, [...] the HNF6 binding capacity of the mutated oligonucleotide was clearly reduced. A 20 molar excess of the mutated oligonucleotide had only a marginal effect on the binding of HNF6 [...], whereas a 20 molar excess of unlabelled probe [...] completely abolished binding. Supershift analysis with an HNF6 antibody revealed a complex with a slightly lower mobility than the HNF6 complex [...]. By extending the electrophoresis run and including nuclear extract from hypophysectomized rats, devoid of GH and thereby lacking HNF6 (Lahuna et al. 1997), the two different complexes were clearly visualized. The complex with the lower mobility is most probably due to the binding of HNF3, in analogy with what was shown by Lahuna et al. for the CYP2C12 HNF6 binding site; HNF3 can bind to the site in the absence of HNF6 (Lahuna et al. 1997). [...] HNF6 could bind to their respective site in the a1bg promoter in vitro, and the mutations introduced in respective site abolished binding of the corresponding factor."

The "expression of a −116/−89 deletion construct in which also the HNF6 site was mutated, (−116/−89) delmutHNF6-Luc, [...] the generated luciferase activities were reduced in both sexes [...]. This is in contrast to that mutation/deletion of the sites separately only affected the expression in female livers."

The "−116/−89 region contains a site(s) of importance for the GH-dependent and female-specific expression of the a1bg gene, and that the impact of this region together with the HNF6 site is more complex than mere enhancement of the expression in females."

Following "mutation of the HNF6-binding element, mutHNF6-Luc, the sex-differentiated expression was attenuated due to reduced expression in females. Thus, for a1bg, the sex-related difference in amount of HNF6 is likely to contribute to the sex-differentiated and female characteristic expression."

Nuclear "proteins binding to the a1bg −116/−89 region [are] members of the [nuclear factor 1] NF1 and the [octamer transcription factor] Oct families of transcription factors. NF1 genes are expressed in most adult tissues (Osada et al. 1999). It is not known how NF1 modulates transcriptional activity, and both activation and repression of transcription have been reported (Gronostajski 2000). Cofactors such as CBP/p300 and HDAC have been shown to interact with NF1 proteins suggesting modulation of chromatin structure (Chaudhry et al. 1999). NF1 factors have also been shown to interact directly with the basal transcription machinery as well as with other transcription factors, including Stat5 (Kim & Roeder 1994, Mukhopadhyay et al. 2001) and synergistic effects with HNF4 have been reported (Ulvila et al. 2004). In addition to the HNF6, Stat5 and NF1/Oct sites, the a1bg promoter harbours an imperfect HNF4 site at −51/−39 with two mismatches compared with the HNF4 consensus site. HNF4 is clearly important for the expression of CYP2C12 (Sasaki et al. 1999), however, the −51/−39 region in a1bg was not protected in the footprinting analysis and was therefore not analysed further. Like NF1, Oct proteins have been reported to be involved in activation as well as repression of gene expression (Phillips & Luisi 2000). [...] Moreover, NF1 and Oct-1 have been shown to, reciprocally, facilitate each other’s binding (O’Connor & Bernard 1995, Belikov et al. 2004)."

In the diagram on the right is liver "expression of a1bg-luciferase constructs. (A) Stat5 and HNF6 consensus sequences and corresponding sites in the 2.3 kb a1bg promoter alongside with the used mutations. (B) Female (black bars) and male (open bars) rats [results]."

"Computer analysis of the 2.3 kb rat a1bg promoter fragment revealed [a] HNF6 [site] at [...] −69/−61 [...]."

The murine downstream promoter element is only 11 nts displaced from the human one. This suggests a HNF6 participation in human gene transcription of A1BG.

"Computer analysis of the 2.3 kb rat a1bg promoter fragment revealed two putative HNF6 sites [...] at −2077/−2069 [and] −69/−61 [...]."

There are two HNF6s on the negative strand in the negative direction, 3'-AAGCAACTT-5' at 3506 (-954) and 3'-AAGGGACTT-5' at 3782 (-678) in the distal promoter between ZSCAN22 and A1BG. Although much closer than their likely murine counterparts, they are on the other side of A1BG from the HNF6 site confirming hypothesis 1. If active in humans or murine-like HNF6s occur within or beyond ZNF497 in this distal promoter, then human A1BG is transcribed using HNF6 promoters disproving hypothesis 2.

A Google Scholar search using ZNF497 with HNF6 found no articles discussing HNF6 sites inside or associated with ZNF497. To confirm they exist, a data file going 4300 nts to just beyond ZNF497 has been created and tested for a distal promoter on this side. Distal HNF6s in the positive direction, if they exist, would be inside ZNF497 or beyond, e.g., 3'-ATGTCCATGG-5' at 3581 was found.

Literature search has found that HNF6s assist transcription of A1BG by other transcription factors. The proximal HNF6 promoter is -58 to -50 from A1BG TSS. If another HNF6 promoter is at -2.3 kb, it is about -1.4 kb inside ZNF497 which is 3212 nts long. Per analogy to the rat this would be expected.

Per earlier laboratories transcription factors may occur in the distal promoters on the ZNF497 side of A1BG for
 * 1) ATA boxes 3'-AATAAA-5' occurs at 3427,
 * 2) CArG boxes,
 * 3) Enhancer boxes,
 * 4) HY boxes,
 * 5) MREs and
 * 6) STAT5s 3'-TTCCATGAA-5' occurs at 128.

The HNF6 promoter on the other side of A1BG (at about +3 kb is way beyond -2.1 through ZNF497 unless the DNA is folded to allow the HNF6 on the ZSCAN22 side to be used in analogy to the HNF6 on the same side as in the rat.

HNF6s have a role as downstream signal transducers in A1BG, where the murine downstream promoter element is only 11 nts displaced from the human one. This suggests a HNF6 participation in human gene transcription of A1BG.

HY boxes
HY boxes were not found in either core promoters or the proximal promoters in either direction. However, HY boxes were found in the distal promoters on both sides of A1BG. No genes are described in the literature so far as transcribed from HY boxes in any distal promoters.

Either A1BG can be transcribed by HY boxes in the distal promoter, or A1BG is not transcribed by HY boxes. As the literature appears absent from a Google Scholar advanced search to confirm possible transcription from distal promoters, wet chemistry experiments are needed to test the possibility.

Metal responsive elements (MREs)
By combining a literature search with computer analysis of the promoter between ZSCAN22 and A1BG and ZNF497 and A1BG, metal responsive elements have been found. Literature search has also discovered at least three post-translational isoforms including the unaltered precursor. Although no metal responsive elements overlap any enhancer boxes in the distal promoter, there are elements in the distal promoter.

"The human genome is estimated to contain 700 zinc-finger genes, which perform many key functions, including regulating transcription. [Four] clusters of zinc-finger genes [occur] on human chromosome 19".

Nearby zinc-fingers on chromosome 19 include ZNF497 (GeneID: 162968), ZNF837 (GeneID: 116412), and ZNF8 (GeneID: 7554).

"In rodents and in humans, about one third of the zinc-finger genes carry the Krüppel-associated box (KRAB), a potent repressor of transcription (Margolin et al. 1994), [...]. There are more than 200 KRAB-containing zinc-finger genes in the human genome, about 40% of which reside on chromosome 19 and show a clustered organization suggesting an evolutionary history of duplication events (Dehal et al. 2001)."

ZNF8 is in cluster V along with A1BG.

"In contrast to the four clusters considered [I through IV], one that occurs at the telomere of chromosome 19, which we will call cluster V, has been very stable [over mouse, rat, and human]."

"Apart from the somewhat unexpected location of Zfp35 on mouse chromosome 18 and of the AIBG orthologs on mouse chromosome 15 and rat chromosome 7, there has been little rearrangement."

So far no article has reported any linkage between zinc, including various zinc fingers, or cadmium, and A1BG.

Regarding additional isoforms, mention has been made of "new genetic variants of A1BG."

"Proteomic analysis revealed that [a circulating] set of plasma proteins was α 1 B-glycoprotein (A1BG) and its post-translationally modified isoforms."

Pharmacogenomic variants have been reported. There are A1BG genotypes.

A1BG has a genetic risk score of rs893184.

"A genetic risk score, including rs16982743, rs893184, and rs4525 in F5, was significantly associated with treatment-related adverse cardiovascular outcomes in whites and Hispanics from the INVEST study and in the Nordic Diltiazem study (meta-analysis interaction P=2.39×10−5)."

"rs893184 causes a histidine (His) to arginine (Arg) [nonsynonymous single nucleotide polymorphism (nsSNP), A (minor) for G (major)] substitution at amino acid position 52 in A1BG."

For example, GeneID: 9 has isoforms: a, b, X1, and X2. Each of these (a and b) have variants. Variants 1-6 and 9 all encode the same isoform (a).

Variants 7, 8 and 10 all encode isoform b. Isoforms X1 and X2 are predicted.

Variants can differ in promoters, untranslated regions, or exons. For GeneID: 9: This variant (1) represents the longest transcript but encodes the shorter isoform (a). This variant is transcribed from a promoter known as P1, promoter 2, or NATb promoter.

This variant (2, also known as Type IID) lacks an alternate exon in the 5' UTR, compared to variant 1. This variant is transcribed from a promoter known as P1, promoter 2, or NATb promoter.

This variant (9, also known as Type IA) has a distinct 5' UTR and represents use of an alternate promoter known as the NATa or P3 promoter, compared to variant 1.

But, A1BG in NCBI Gene lists only one isoform, the gene locus itself, and the protein transcribed is a precursor subject to translational or more likely post-translational modifications.

The presence of multiple MREs coupled with experimental results from the literature indicating post-translational isoforms tends to confirm the existence of two or more isoforms for A1BG.

It isn't known which, if any, assist in locating and affixing the transcription mechanism for A1BG. This examination is the first to test one such DNA-occurring TF: the HNF6s.

The presence of multiple MREs coupled with experimental results from the literature indicating post-translational isoforms tends to confirm the existence of two or more isoforms for A1BG and likely transcription from either side.

Motif ten elements
No Motif ten elements occur on either side of A1BG.

Pyrimidine boxes
Pyrimidine boxes and their complements: 3'-CCTTTT-5' at 2459, 3'-CCTTTT-5' at 2927, and 3'-CCTTTT-5' at 2968 occur in the negative direction.

Inverse pyrimidine boxes and their complements occur 3'-AAAAGG-5' at 1107, 3'-AAAAGG-5' at 3345, and 3'-AAAAGG-5' at 3441 also in the negative direction.

STAT5s
STAT5s have a role as downstream signal transducers in A1BG, where the murine downstream promoter element is only 11 nts displaced from the human one. This suggests a STAT5 participation in human gene transcription of A1BG in the proximal promoter downstream between any other promoter and the TSS on the ZNF497 side of A1BG.

A1BG is not transcribed by any STAT5s is clearly disproved by the STAT5 transcription factor in the proximal promoter on the ZNF497 side of A1BG.

STAT5s may assist transcription of A1BG by other transcription factors, literature search has found that STAT5s assist transcription of A1BG by other transcription factors. The proximal STAT5 promoter is -58 to -50 from A1BG TSS. If another STAT5 promoter is at -2.3 kb, it is about -1.4 kb inside ZNF497 which is 3212 nts long. Per analogy to the rat this would be expected. A STAT5 transcription site lies at 3'-TTCCGGGAA-5' at 4247 in the proximal promoter, i.e. from 4242 (-58) to 4250 (-50). This suggests that STAT5 assists in the transcription of A1BG.

TATA boxes
On the negative strand in the negative direction (from ZSCAN22 to A1BG), looking for 3'-TATA-A/T-A-A/T-A/G-5', there no TATA boxes in the core promoter.

On the negative strand in the positive direction (from ZNF497 to A1BG), looking for 3'-TATA-A/T-A-A/T-A/G-5-5', there no TATA boxes in the core promoter.

There are no TATA boxes in the negative direction of the proximal promoter.

There are no TATA boxes in the positive direction of the proximal promoter.

For the positive strand in the negative direction looking for 3'-TATA-A/T-A-A/T-A/G-5', there's one 3'-TATATAAA-5' at 2874 nts, its complement and inverse complement of the distal promoter.

Any TATA boxes before A1BG from the direction of ZN497 may be out to 2300 nts. None were found in the distal promoter.

On the positive strand, in the nucleotide region between gene ZSCAN22 (NCBI GeneID: 342945) and A1BG (NCBI GeneID: 1) are 211 TATA box-like 8 nt long sequences. Of these,
 * 1) TATAAAAG occurs at 58853713 + 183 nts and
 * 2) TATAAAAG at 58853713 + 222. This is a TATA box found with some genes. But, the optimal TBP recognition sequence 3'-TATATAAG-5', does not occur.
 * 3) TATATAAA occurs only once at 2874 nts from the end of ZSCAN22. TBP is bound to this sequence and TATAAAAG above.
 * 4) TATAAA occurs seven times, with the closest one at 2874 nts from the end of ZSCAN22. "In virtually every RNA polymerase II-transcribed gene examined, the sequence TATAAA was present 25 to 30 nts upstream of the transcription start site."

A1BG does not have a TATA box in the core promoter region. There is the sequence 3'-TGCTATATAGATGGCAACTAAGCACTTGGGGAAAAAA-5' for which the first nt (T) is number 58856598 or 1574 nt upstream from the beginning of the 3'-UTR at 58858172. Unless another variant exists, -1574 nt from the beginning of the 3'-UTR is a large number of nts away from the TSS.

The closest TATA box-like sequence is 3'-CTCTTAAG-5' on the template strand at 4408 nts from the end of ZSCAN22, which is upstream from the core promoter.

The extra TATA boxes between ZSCAN22 and A1BG strongly suggest that there is at least one gene (or pseudogene) between ZSCAN22 and A1BG not currently in the NCBI database.

On the negative strand between ZNF497 and A1BG, there are no TATA boxes of the form 3’-TATA-A/T-A-A/T-A/G-5’.

For the negative strand going from ZSCAN22 to A1BG there are two TATA boxes: 3'-TATATATA-5' at 1600 nts and 3'-TATATAAA-5' at 1602 nts. These are way too far from the possible TSS in this direction.

These two TATA boxes in the distal promoter at approximately -2860 nts from the TSS suggest that there may be a short gene between ZSCAN22 and A1BG.

The hypothesis: TATA boxes are not involved in the transcription of A1BG is true. "In virtually every RNA polymerase II-transcribed gene examined, the sequence TATAAA was present 25 to 30 nts upstream of the transcription start site."

There are no TATA boxes at all between ZNF497 and A1BG.

On the negative strand between ZSCAN22 and A1BG there are many TATA boxes between 184 nts from ZSCAN22 and 2874 nts from ZSCAN22 yet no genes are apparently known to occur between ZSCAN22 and A1BG. ZSCAN22 has several isoforms but all end exactly at the one TSS on the A1BG side.

From the number and variety of TFs on both sides of A1BG, multiple transcriptions should be possible. Any connection between bone, muscle and brain mass loss and A1BG likely uses one or more of the sides, directions, or forms (16 ways) and includes one or more TFs. Determining which produces deleterious effects is the first step toward reversal in a zero-g radiation inducing environment.

TAT boxes
An inverse TAT box occurs 3'-TACCTAT-5' at 2996 with its complement in the negative direction.

TATCCAC boxes
No TATCCAC boxes occur on either side of A1BG.

W boxes
Inverse W boxes occur within the proximal promoter in the negative direction of A1BG: 3'-GGTCAA-5' at 4416 and 3'-GGTCAA-5' at 4308.

W boxes occur within the proximal promoter in the positive direction of A1BG: 3'-CTGACC-5' and its complement at 4216 and inverse W boxes occur 3'-GGTCAG-5' and its complement at 4270.

A W box occurs 3'-CTGACC-5' at 3749, 3'-CTGACT-5' at 1935 could be associated ZSCAN22 or an unknown gene between it and A1BG, along with their complements in the negative direction of the distal promoter.

W box inverses occur 3'-GGTCAG-5' at 1353 and 3'-AGTCAG-5' at 2101, 3'-GGTCAG-5' at 2221, 3'-AGTCAG-5' at 2608, 3'-AGTCAA-5' at 2614, and 3'-AGTCAG-5' at 2619 along with their complements in the negative direction of the distal promoter.

W boxes occur 3'-CTGACC-5' at 1662, 3'-CTGACC-5' at 2213, 3'-TTGACC-5' at 2873, 3'-CTGACT-5' at 2945, and 3'-TTGACC-5' at 4018 that could be associated with A1BG, along with 3'-TTGACC-5' at 1953, 3'-CTGACT-5' at 2674, and 3'-TTGACT-5' at 3735 in the positive direction of the distal promoter.

X boxes
No X boxes occur on either side of A1BG.

Y boxes
No Y boxes occur on either side of A1BG.

Experiments
Regarding hypothesis 1: E2 boxes are not present in the promoter of A1BG.

The Basic programs (starting with SuccessablesE2box.bas) were written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), including the extended number of nts from 958 to 4445, looking for E2 boxes, their possible complements and inverses, to test the hypothesis that E2 boxes are not present in the promoter of A1BG.

Regarding hypothesis 2: If an E2 box is present it does not assist in the transcription of A1BG.

Literature searches were performed to determine the likely TFs and possible interactions to transcribe A1BG.

Hypothesis 1
E2 boxes are not present in the promoter of A1BG.

There are no E2 boxes in the core promoter in the negative direction.

There are no E2 boxes in the core promoter in the positive direction.

There is one E2 box 3'-ACAGATGT-5' at 4213 in the negative direction of the proximal promoter.

There is no E2 box in the positive direction of the proximal promoter.

There are two E2 boxes of the distal promoter in the negative direction: 3'-ACAGATGT-5' at 2989 and 3'-ACAGATGT-5' at 4213 nts, negative strand, and two on the positive strand: 3'-GCAGGTGG-5' at 2571 and 3'-ACAGATGA-5' at 3920, plus their complements an inverses.

There are no E2 boxes on the distal promoter in the positive direction.

Hypothesis 2
If an E2 box is present it does not assist in the transcription of A1BG.

Google Scholar search using "E2 box" "GCC box":

Both of these TFs are apparently involved with key regulators of paclitaxel biosynthesis in Taxus cuspidata.

"Paclitaxel is mainly derived from the plant genus Taxus and has been widely used in cancer chemotherapy. However, plant cell culture is often not commercially viable because of difficulties associated with culturing dedifferentiated plant cells (DDCs) on an industrial scale. [Undifferentiated] cambial meristematic cells (CMCs) from Taxus cuspidata, [...] possess superior growth properties relative to DDCs. These CMCs have been demonstrated to be a cost effective platform for the sustainable production of paclitaxel. Using 454 sequencing, we determined the transcriptome of T. cuspidata CMCs. Utilizing this transcriptome as a reference, we then employed Solexa digital gene expression profiling to identify transcriptional regulators that were induced by methyl jasmonate, an activator of paclitaxel biosynthesis. This lead to the discovery of 19 putative transcription factors (TFs) belonged to 5 TF families which were further confirmed by associated molecular methods. We aimed to identify which of these 19 regulatory proteins drive the expression of 5 paclitaxel biosynthetic genes by employing yeast one-hybrid analysis and electrophoretic mobility shift assays."

Google Scholar search using "E2 box" "ATA box":

Your search - "E2 box" "ATA boxes" - did not match any articles.

Google Scholar search using "E2 box" "CAAT box":

About 14 results (0.05 sec): "There was no consensus CAAT box. [...] In addition, we performed mutation analyses of the E2 box and the E3 box to evaluate whether the E2 and E3 boxes regulate the transcriptional activity of the human NeuroD gene [...]." The "5′ genomic sequences revealed promoter elements containing a TATA box at nucleotides −23 to −27 and a CAAT box between nucleotides [...] and an E2 box [...]." "Studies have reported that the cap signal element with the TATA-box, CAAT-box, and GC-box is the most general element of the POL II promoter and exists in major protein [...] The delta-crystallin enhancer-binding protein delta EF1 is a repressor of E2-box-mediated gene activation [...]."

Google Scholar search using "E2 box" "C and D boxes":

"There are two main classes of small nucleolar RNAs (snoRNAs): the box C/D snoRNAs and the box H/ACA snoRNAs that function as guide RNAs to direct sequence-specific modification of rRNA precursors and other nucleolar RNA targets. A previous computational and biochemical analysis revealed a possible evolutionary relationship between miRNA precursors and some box H/ACA snoRNAs. Here, we investigate a similar evolutionary relationship between a subset of miRNA precursors and box C/D snoRNAs. Computational analyses identified 84 intronic miRNAs that are encoded within either box C/D snoRNAs, or in precursors showing similarity to box C/D snoRNAs. Predictions of the folded structures of these box C/D snoRNA-like miRNA precursors resemble the structures of known box C/D snoRNAs, with the boxes C and D often in close proximity in the folded molecule. All five box C/D snoRNA-like miRNA precursors tested (miR-27b, miR-16-1, mir-28, miR-31 and let-7g) bind to fibrillarin, a specific protein component of functional box C/D snoRNP complexes."

"RT–PCR used to detect co-precipitated HBII-239, hsa-mir-let-7g, hsa-mir-16-1, hsa-mir-27b, has-mir-28 and has-mir-31 miRNA precursors, with U3 box C/D snoRNA as positive control and, U1 snRNA, 5 S rRNA, GAPDH pre-mRNA and E2 box H/ACA snoRNA as negative controls for fibrillarin-associated RNAs."

Google Scholar search using "E2 box" "CAACTC regulatory elements":

1 result (0.04 sec): mentions E2-box sequences but not CAACTC.

Google Scholar search using "E2 box" "CAACTC regulatory elements", CARE or CAREs:

Your search - "CAACTC regulatory element" "E2 box" - did not match any articles. The use of CARE picked up "About 263 results (0.09 sec)" but care rather than CARE. CAREs picked up "3 results (0.05 sec)" for cares.

Google Scholar search using "E2 box" "CArG boxes":

8 results (0.05 sec): "The HLH-binding sites (E1 and E2-box) are located at position −214 > −219 and −252 > −257. Four CArG-boxes (A, B, C, and D) are present in the upstream region of the SMA gene."

Google Scholar search using "E2 box" "CENP-B box":

Your search - "E2 box" "CENP-B box" - did not match any articles.

Google Scholar search using "E2 box" "CGCG box":

Your search - "E2 box" "CGCG box" - did not match any articles.

Google Scholar search using "E2 box" "CRE box":

1 result (0.04 sec): "In the rat insulin I promoter, there is an additional E2 box and in humans one E2-like box that binds the protein USF (Read et al., 1993). [...] In human insulin gene there are four CRE boxes (two within the promoter region) (Inagaki et al., 1992), only one CRE box is present in the rat promoter (Crowe and Tsai 1989, Philippe and Missotts 1990)."

Google Scholar search using "E2 box" "DREB box":

Your search - "E2 box" "DREB box" - did not match any articles.

Google Scholar search using "E2 box" "EIF4E basal element":

Your search - "E2 box" "EIF4E basal element" - did not match any articles.

Google Scholar search using "E2 box" "4EBE":

Your search - "E2 box" "4EBE" - did not match any articles.

Google Scholar search using "E2 box" "E box":

About 622 results (0.08 sec): "The two E2 boxes in the mouse and human E-cadherin promoter sequences were demonstrated to play a crucial role in the epithelial-specific expression of E-cadherin Behrens et al. 1991, Giroldi et al. 1997. Mutation of these sequence elements results in upregulation of the E-cadherin promoter in dedifferentiated cancer cells, whereas the wild-type promoter shows low activity in such cells. Recently, it was shown that the zinc finger transcriptional repressor Snail can downregulate E-cadherin by binding to the E boxes in the E-cadherin promoter Batlle et al. 2000, Cano et al. 2000. Human Snail belongs to a family of zinc finger proteins, which contain four or five zinc finger domains of the C2H2 type at their C-terminal end. These zinc fingers bind to the CANNTG sequence in E box motifs."

Google Scholar search using "E2 box" "Factor II B recognition element":

Your search - "E2 box" "Factor II B recognition element" - did not match any articles.

Google Scholar search using "E2 box" "BRE":

Your search - "E2 box" "BRE" - did not match any articles.

About 13 results (0.06 sec): "An E2-box and three AP-1-binding sites were found on the promoter. No typical BRE elements were located on this promoter." But, BMP response element is called BRE element.

Google Scholar search using "E2 box" "GA responsive element":

Your search - "E2 box" "GA responsive element" - did not match any articles.

Google Scholar search using "E2 box" "GARE":

2 results (0.04 sec) but no access available.

Google Scholar search using "E2 box" "G box":

About 14 results (0.10 sec): "One of the candidates for the pinopsin LRE-binding factor is δ-crystallin enhancer binding protein (δEF1)/zinc finger, E-box binding protein (ZEB), which was identified as a ubiquitous transcriptional repressor acting through the CACCT(G) E2 box (Funahashi et al., 1993; Genetta et al., 1994). [An] LRE for pinopsin gene regulation is present at positions -1103 to -1086 in the promoter region and that the light dependency of the promoter activity is completely lost by introducing mutations within these positions. Interestingly, the CACGTG sequence found in the pinopsin LRE completely matches the G box (CACGTGG), one of the LREs identified in plants (Donald and Cashmore, 1990), in which the element is not effective by itself and a combination with its specific minimal promoter is indispensable for the expression of light responsiveness."

Google Scholar search using "E2 box" "GLM box":

Your search - "E2 box" "GLM box" - did not match any articles.

Google Scholar search using "E2 box" "HNF6":

5 results (0.05 sec): Pronounced "p65 binding in saline-treated livers in the untranslated first exon of the Per2 gene in a region containing both a NF-κB-binding motif (GGGRNYYYCC, where R is a purine, Y is a pyrimidine, and N is any nucleotide) and the noncanonical E2-box (CACGTT) motif (∼220 base pairs [bp] downstream from the NF-κB motif) bound by CLOCK and BMAL1 that has been described previously to preferentially drive circadian transcription of the Per2 locus (Supplemental Fig. S2A; Yoo et al. 2005). [Motif] analyses revealed significant enrichment in metabolic and circadian bZIP factors (CEBP and HLF), HNF6-binding motifs, and the circadian clock pathway (E-box and USF1), in addition to p65 (Fig. 4C), following a HFD."

Google Scholar search using "E2 box" "HY box":

Your search - "E2 box" "HY box" - did not match any articles.

Google Scholar search using "E2 box" "MRE":

4 results (0.04 sec): "The skeletal actin CArG motif functioned as a muscle regulatory element (MRE) in that basal expression was detected only in muscle cultures." But, MRE here designates "Metal Response Element". The E2 box apparently is not in the promoter of ZNF658 though it may repress transcription similarly.

Google Scholar search using "E2 box" "Metal Response Element": 5 results (0.09 sec) δ-crystallin/E2-box factor and metal response element are mentioned in possible connection to stannin (Snn) gene. Other four do not mention interaction or common gene promoter.

Google Scholar search using "E2 box" "MTE":

2 results (0.06 sec): Each RNA polymer II holoenzyme complex needs TFs in various combinations to initiate transcription, including "In addition to the DPE, two other core promoter elements have been identified downstream of the transcription startsite. The MTE (motive ten element) has the consensus C[GC]A[AG]C[GC][GC]AACG[GC] and is typically located at position +18 to +28 relative to the transcriptional startsite. [and] Extensive work on the xbra promoter showed that the correct spatial expression confined in the margin of early gastrulation stage in Xenopus embryos is mainly established by repressive signals rather than activation (Latinkic et al, 1997; Lerchner et al, 2000). A search for putative transcription factor binding sites in the proximal xbra promoter identified a deltaEF1 binding site that, in conjunction with an E2-box restricts expression of xbra to the marginal zone in early gastrulation stages."

Google Scholar search using "E2 box" "Pyrimidine box":

Your search - "E2 box" "Pyrimidine box" - did not match any articles.

Google Scholar search using "E2 box" "STAT5":

About 30 results (0.05 sec): "[O]ligonucleotides [contain] a single E-box (E1 or E2) present in the GLε promoter."

Google Scholar search using "E2 box" "TATA box": "A computer search for transcription promoter elements (see [the image on the right]) showed the presence of a prominent TATA box 22 nucleotides upstream of the transcription start site and an Sp1 site at position -42 to -33. The 5'-flanking sequence also contains three E boxes with CANNTG consensus sequences at positions -464 to -459, -90 to -85, and -52 to -47 that have been marked as E box, E1 box, and E2 box, respectively (see [the image on the right]). In addition, the 5'-flanking region contains one or more GRE, XRE, GATA-1, GCN-4, PEA-3, AP1, and AP2 consensus motifs and also three imperfect CArG sites (¥𐐏𐐜𐑣☋♆☉♆CArG) as indicated in [the image on the right]."

"The locations of various factor binding motifs including the E1 box, E2 box, TATA box, and the transcription initiation site are indicated."

Google Scholar search using "E2 box" "TAT box":

1 result (0.06 sec): "Bibliography of the current world literature". Not accessible.

Google Scholar search using "E2 box" "TATC box":

Your search - "E2 box" "TATC box" - did not match any articles.

Google Scholar search using "E2 box" "W box":

5 results (0.07 sec): Both of these TFs are apparently involved with key regulators of paclitaxel biosynthesis in Taxus cuspidata.

Google Scholar search using "E2 box" "X box":

About 29 results (0.09 sec): Articles contain one or the other but not both to the same gene.

Google Scholar search using "E2 box" "Y box":

About 33 results (0.15 sec): Articles contain one or the other but not both to the same gene.

Hypothesis 1 discussion
There are no E2 boxes available on the ZNF497 side of A1BG for transcribing A1BG (positive direction).

On the negative direction side from ZSCAN22 to A1BG there are E2 boxes near the proximal promoter but actually only in the distal promoter: 3'-ACAGATGT-5' at 2989 and 3'-ACAGATGT-5' at 4213 nts, negative strand, and two on the positive strand: 3'-GCAGGTGG-5' at 2571 and 3'-ACAGATGA-5' at 3920, plus their complements an inverses.

While there is no known transcription of A1BG using E2 boxes, these TFs could be used even if only to moderate transcription.

Hypothesis 2 discussion

 * AGC (GCC) box: There appears to be no direct evidence that an E2 box and a GCC box occur in the same promoter let alone interact. However, an AGC (GCC) box was found in the distal promoter of either gene ZSCAN22 or A1BG on both the template and coding strands. Therefore, A1BG has both TFs in the negative direction and interaction cannot be ruled out.


 * ATA box: There is one inverse and inverse complement ATA box in the proximal promoter in the positive direction between 4050 and 4300: 3'-AAATAA-5' at 4142, and 3'-TTTATT-5' at 4142. The ATA box at 2347 (positive direction) is likely involved in transcription of A1BG in analogy to the rat. There is the following inverse ATA box on the negative strand, negative direction: 3'-AAATAA-5' at 4537 downstream from the TSS at 4460. If assisting other TFs is limited to ZSCAN22 side of A1BG where the E2 boxes are and assistance occurs to either TF from the other, the interaction cannot be ruled out. But there are apparently no articles to confirm occurrence and interaction of E2 boxes, ATA boxes, and A1BG.


 * CAAT box: "There was no consensus CAAT box. [...] In addition, we performed mutation analyses of the E2 box and the E3 box to evaluate whether the E2 and E3 boxes regulate the transcriptional activity of the human NeuroD gene [...]." Likewise, there's no CAAT box in either A1BG promoter and there are two E2 boxes on the ZSCAN22 side. Interaction can be ruled out.


 * C and D boxes: Usually C and D boxes are TFs for snoRNAs.
 * Four C boxes in the distal promoter: 3'-AGTAGT-5' at 2888, 3'-AGTAGT-5' at 2944, 3'-AGTAGT-5' at 3418, and 3'-AGTAGT-5' at 3521 on the negative strand in the negative direction and one D box in the distal promoter: 3'-AGTCTG-5' at 2947 which overlaps the second C box (AGT). No gene has been annotated so far between ZSCAN22 and A1BG, but Gene ID: 503538 A1BG antisense RNA 1 is usually transcribed in the negative direction.
 * Two C boxes 3'-TCATCA-5' at 3251 and 3'-ACATCA-5' at 4116 on the negative strand in the positive direction and one D box in the distal promoter: 3'-AGTCTG-5' at 3923 on the negative strand in the positive direction and complements on the positive strand. Gene ID: 503538 A1BG antisense RNA 1 has one promoter inside ZNF497, but Gene ID: 503538 begins about at 2600 nts which means the C and D boxes on the ZNF497 side of A1BG are inside the gene 503538 A1BG antisense RNA 1.
 * As of 15 August 2012 another "C box" has been designated, "The results from our study are largely complementary to the modENCODE efforts in that we identify the C-box, a novel enhancer element for a relatively large set of genes, which all share a common mode of regulation, namely being regulated by DAF-19/RFX." Its consensus sequence is apparently (C/T)(C/T)(C/T)T(C/T)T(C/T)(C/T)T(C/T)(A/C/G).
 * There is another promoter D box, or D-box: "Located in the region [...] is a single D-box element (5′-GTTGTATAAC-3′) with a distinct sequence from that of the functional D-box identified in the per2 promoter (5′-CTTATGTAAA-3′) [21]."
 * "The two MAPK docking consensus sequences present in hBVR, F162GFP and K275KRILHCLGL (C- and D-box, respectively [no snoRNAs]), are ERK interactive sites; interaction at each site is critical for ERK/Elk1 activation."
 * Interaction between C and D boxes for snoRNAs and the promoters of A1BG so far cannot be ruled out.


 * CARE: Inverse CAREs occur 3'-CTCAAC-5' at 1406, 3'-CTCAAC-5' at 2592, 3'-CTCAAC-5' at 2704, 3'-CTCAAC-5' at 3115, and 3'-CTCAAC-5' at 4096 in the negative direction. This suggests they are TFs for ZSCAN22, but interaction with E2 boxes cannot be ruled out yet. A CARE occurs 3'-CAACTC-5' at 3292 in the positive direction. But inverse CAREs occur 3'-CTCAAC-5' at 1406, 3'-CTCAAC-5' at 1621 and 3'-CTCAAC-5' at 3290. The positive direction CAREs suggest they are TFs for ZNF497 and no E2 boxes in the positive direction seems to make interaction remote but not ruled out.


 * CArG box: Computer-program testing has revealed two general CArG boxes: 3'-CAAAAAAAAG-5' at 1399 nts from ZSCAN22 and 3'-CATTAAAAGG-5' at 3441 nts from ZSCAN22, but none within 4300 nts toward A1BG from ZNF497. "The HLH-binding sites (E1 and E2-box) are located at position −214 > −219 and −252 > −257. Four CArG-boxes (A, B, C, and D) are present in the upstream region of the SMA gene." These two facts make it likely A1BG is transcribed in the negative assisted by E2 boxes and CArG boxes.


 * CENP-B box: With no CENP-B boxes occur on either side of A1BG. And, "Your search - "E2 box" "CENP-B box" - did not match any articles." rules out interactions.


 * CGCG box: The result: All of the CGCG boxes found are more closely associated with ZSCAN22 or ZNF497 than A1BG, coupled with "Your search - "E2 box" "CGCG box" - did not match any articles." rules out interactions.


 * CRE box: There is one CRE box on the negative strand pointing toward A1BG in the proximal promoter in the negative direction between A1BG and ZSCAN22: 3'-TGACGTCA-5' 4317 nts, that can be involved in the transcription of A1BG probably with an Inr rather than a TATA box, coupled with "Your search - "E2 box" "CGCG box" - did not match any articles." appears contradictory and suggests as yet unknown interactions.


 * DREB box: no DREB boxes occur on either side of A1BG, coupled with "Your search - "E2 box" "DREB box" - did not match any articles." rules out interactions.


 * EIF4E basal element: no EIF4E basal elements, also eIF4E or (4EBE), occur on either side of A1BG, coupled with "Your search - "E2 box" "EIF4E basal element" - did not match any articles." and "Your search - "E2 box" "4EBE" - did not match any articles." rules out interactions.


 * Enhancer box (E box): enhancer boxes do occur in the proximal and distal promoters of A1BG. And, it is likely that an enhancer box is involved in some way with the transcription of A1BG. This coupled with about 622 results (0.08 sec), e.g.: "The two E2 boxes in the mouse and human E-cadherin promoter sequences were demonstrated to play a crucial role in the epithelial-specific expression of E-cadherin Behrens et al. 1991, Giroldi et al. 1997. Mutation of these sequence elements results in upregulation of the E-cadherin promoter in dedifferentiated cancer cells, whereas the wild-type promoter shows low activity in such cells. Recently, it was shown that the zinc finger transcriptional repressor Snail can downregulate E-cadherin by binding to the E boxes in the E-cadherin promoter Batlle et al. 2000, Cano et al. 2000. Human Snail belongs to a family of zinc finger proteins, which contain four or five zinc finger domains of the C2H2 type at their C-terminal end. These zinc fingers bind to the CANNTG sequence in E box motifs." confirms interaction is likely.


 * Factor II B recognition element (BREu): there is one BREu in the distal promoter: 3'-CCGCACC-5' at 3047 on the negative strand in the negative direction and its complement on the positive strand and there is one BREu in the distal promoter: 3'-CCGCACC-5' at 2566 on the negative strand in the positive direction and its complement on the positive strand. "A computational study based on statistical analysis of curated promoter sets concluded that up to 25% of human core promoters contain a potential BREu. The motif was found to be enriched in CpG promoters (>30% frequency) but depleted in CpG-less promoters (<10% frequency) [14]." But, "Your search - "E2 box" "Factor II B recognition element" - did not match any articles." suggests that interactions may not be known to occur for these two TFs. Also, a search of Google Scholar and the full web failed to produce any examples of BREu assisted A1BG transcription. Interaction cannot be ruled out.


 * GARE: an inverse GARE: 3'-AAACAAT-5' and its complement at 230 nts occur close to ZSCAN22, likely way outside the distal promoters, so no GARE occur on either side of A1BG in the distal promoters. Also, from a Google Scholar search there are 2 results (0.04 sec) but no access available. This appears to confirm that interaction can be ruled out.


 * G box: no G boxes occur on either side of A1BG, coupled with about 14 results (0.10 sec), including: "One of the candidates for the pinopsin LRE-binding factor is δ-crystallin enhancer binding protein (δEF1)/zinc finger, E-box binding protein (ZEB), which was identified as a ubiquitous transcriptional repressor acting through the CACCT(G) E2 box (Funahashi et al., 1993; Genetta et al., 1994). [An] LRE for pinopsin gene regulation is present at positions -1103 to -1086 in the promoter region and that the light dependency of the promoter activity is completely lost by introducing mutations within these positions. Interestingly, the CACGTG sequence found in the pinopsin LRE completely matches the G box (CACGTGG), one of the LREs identified in plants (Donald and Cashmore, 1990), in which the element is not effective by itself and a combination with its specific minimal promoter is indispensable for the expression of light responsiveness." indicates no interaction.


 * GLM box: no GLM boxes occur on either side of A1BG, coupled with "Your search - "E2 box" "GLM box" - did not match any articles." indicates interaction can be ruled out.


 * HNF6: there are two HNF6s in the proximal promoter between 4050 and 4300, 3'-TTATTGATTA-5' at 4164 and 3'-TATAATTGTT-5' at 4172, i.e. outside from 4242 (-58) to 4250 (-50), and there are two HNF6s on the negative strand in the negative direction, 3'-AAGCAACTT-5' at 3506 and 3'-AAGGGACTT-5' at 3782. Both of these are in the distal promoter between ZSCAN22 and A1BG. Plus, 5 results (0.05 sec): Pronounced "p65 binding in saline-treated livers in the untranslated first exon of the Per2 gene in a region containing both a NF-κB-binding motif (GGGRNYYYCC, where R is a purine, Y is a pyrimidine, and N is any nucleotide) and the noncanonical E2-box (CACGTT) motif (∼220 base pairs [bp] downstream from the NF-κB motif) bound by CLOCK and BMAL1 that has been described previously to preferentially drive circadian transcription of the Per2 locus ([...]; Yoo et al. 2005). [Motif] analyses revealed significant enrichment in metabolic and circadian bZIP factors (CEBP and HLF), HNF6-binding motifs, and the circadian clock pathway (E-box and USF1), in addition to p65 [...], following a HFD." indicates interaction is likely.


 * HY box: HY boxes were found in the distal promoters on both sides of A1BG. No genes are described in the literature so far as transcribed from HY boxes in any distal promoters. "Your search - "E2 box" "HY box" - did not match any articles." However, the presence on the ZSCAN22 side of A1BG with an HY box suggests interaction is possible.


 * Metal responsive element (MRE): the presence of multiple MREs tends to confirm likely transcription from either side. Google Scholar 4 results (0.04 sec): "The skeletal actin CArG motif functioned as a muscle regulatory element (MRE) in that basal expression was detected only in muscle cultures." But, MRE here designates "Metal Response Element". The E2 box apparently is not in the promoter of ZNF658 though it may repress transcription similarly. Google Scholar search using "E2 box" "Metal Response Element": 5 results (0.09 sec) δ-crystallin/E2-box factor and metal response element are mentioned in possible connection to stannin (Snn) gene. Other four do not mention interaction or common gene promoter. The presence on the ZSCAN22 side of A1BG of an MRE suggests interaction is possible.


 * Motif ten element (MTE): no Motif ten elements occur on either side of A1BG and from Google Scholar 2 results (0.06 sec): Each RNA polymer II holoenzyme complex needs TFs in various combinations to initiate transcription, including "In addition to the DPE, two other core promoter elements have been identified downstream of the transcription startsite. The MTE (motive ten element) has the consensus C[GC]A[AG]C[GC][GC]AACG[GC] and is typically located at position +18 to +28 relative to the transcriptional startsite. [and] Extensive work on the xbra promoter showed that the correct spatial expression confined in the margin of early gastrulation stage in Xenopus embryos is mainly established by repressive signals rather than activation (Latinkic et al, 1997; Lerchner et al, 2000). A search for putative transcription factor binding sites in the proximal xbra promoter identified a deltaEF1 binding site that, in conjunction with an E2-box restricts expression of xbra to the marginal zone in early gastrulation stages." indicates interaction can be ruled out.


 * Pyrimidine box: and their complements: 3'-CCTTTT-5' at 2459, 3'-CCTTTT-5' at 2927, and 3'-CCTTTT-5' at 2968 occur in the negative direction. Inverse pyrimidine boxes and their complements occur 3'-AAAAGG-5' at 1107, 3'-AAAAGG-5' at 3345, and 3'-AAAAGG-5' at 3441 also in the negative direction. "Your search - "E2 box" "Pyrimidine box" - did not match any articles." The presence of pyrimidine boxes and E2 boxes in the negative direction suggests interaction is possible.
 * STAT5: "STATs [signal transducers and activators of transcription] bind through their DNA-binding domain (DBD) to consensus elements (TTCTTGGAA, STAT5 consensus), resulting in gene transcription." There are the following STAT5s on the positive strand, negative direction: 3'-TTCGTTGAA-5' at 3506, 3'-TTCCCTGAA-5' at 3782. And, their complements on the negative strand, negative direction: 3'-AAGCAACTT-5' at 3506, 3'-AAGGGACTT-5' at 3782. There is one STAT5 in the proximal promoter between 4050 and 4300, 3'-TTCCGGGAA-5' at 4247 in the positive direction. STAT5s may assist transcription of A1BG by other transcription factors, literature search has found that STAT5s assist transcription of A1BG by other transcription factors. About 30 results (0.05 sec): "[O]ligonucleotides [contain] a single E-box (E1 or E2) present in the GLε promoter." Interaction is likely.


 * TATA box: for the positive strand in the negative direction looking for 3'-TATA-A/T-A-A/T-A/G-5', there's one 3'-TATATAAA-5' at 2874 nts, its complement and inverse complement in the distal promoter. Closer to ZSCAN22 there are some 210 TATA box-like sequences. TATAAAAG at 222 nts is a TATA box found with some genes. But, the optimal TBP recognition sequence 3'-TATATAAG-5', does not occur. TATATAAA occurs only once at 2874 nts from the end of ZSCAN22. TBP is bound to this sequence and TATAAAAG above. TATAAA occurs seven times, with the closest one at 2874 nts from the end of ZSCAN22. "In virtually every RNA polymerase II-transcribed gene examined, the sequence TATAAA was present 25 to 30 nts upstream of the transcription start site." A1BG does not have a TATA box in the core promoter region. There is the sequence 3'-TGCTATATAGATGGCAACTAAGCACTTGGGGAAAAAA-5' for which the first nt (T) is number at 1574 nts upstream from the beginning of the 3'-UTR. Unless another variant exists, -1574 nt from the beginning of the 3'-UTR is a large number of nts away from the TSS. The closest TATA box-like sequence is 3'-CTCTTAAG-5' on the template strand at 4408 nts from the end of ZSCAN22, which is upstream from the core promoter. The extra TATA boxes between ZSCAN22 and A1BG strongly suggest that there is at least one gene (or pseudogene) between ZSCAN22 and A1BG not currently in the NCBI database. For the negative strand going from ZSCAN22 to A1BG there are two TATA boxes: 3'-TATATATA-5' at 1600 nts and 3'-TATATAAA-5' at 1602 nts. These are way too far from the possible TSS in this direction. These two TATA boxes in the distal promoter at approximately -2860 nts from the TSS suggest that there may be a short gene between ZSCAN22 and A1BG. On the negative strand between ZSCAN22 and A1BG there are many TATA boxes between 184 nts from ZSCAN22 and 2874 nts from ZSCAN22 yet no genes are apparently known to occur between ZSCAN22 and A1BG. ZSCAN22 has several isoforms but all end exactly at the one TSS on the A1BG side. There are no TATA boxes at all between ZNF497 and A1BG.


 * Here's a example from Google Scholar search using "E2 box" "TATA box": "A computer search for transcription promoter elements (see [the image on the right]) showed the presence of a prominent TATA box 22 nucleotides upstream of the transcription start site and an Sp1 site at position -42 to -33. The 5'-flanking sequence also contains three E boxes with CANNTG consensus sequences at positions -464 to -459, -90 to -85, and -52 to -47 that have been marked as E box, E1 box, and E2 box, respectively (see [the image on the right]). In addition, the 5'-flanking region contains one or more GRE, XRE, GATA-1, GCN-4, PEA-3, AP1, and AP2 consensus motifs and also three imperfect CArG sites (¥𐐏𐐜𐑣☋♆☉♆CArG) as indicated in [the image on the right]." "The locations of various factor binding motifs including the E1 box, E2 box, TATA box, and the transcription initiation site are indicated." Interaction cannot be ruled out but seems unlikely with these TATA boxes for transcription from the ZSCAN22 side.


 * TAT box: an inverse TAT box 3'-TACCTAT-5' occurs at 2996 nts with its complement in the negative direction from ZSCAN22. Google Scholar search using "E2 box" "TAT box", 1 result (0.06 sec): "Bibliography of the current world literature". Not accessible. Interaction cannot be ruled out.


 * TATCCAC box: no TATCCAC boxes occur on either side of A1BG and "Your search - "E2 box" "TATC box" - did not match any articles." rules out interaction.


 * W box: there are W boxes in both directions in the proximal promoters and distal promoters and Google Scholar search using "E2 box" "W box" has 5 results (0.07 sec): both of these TFs are apparently involved with key regulators of paclitaxel biosynthesis in Taxus cuspidata. These indicate interaction is likely.


 * X box: no X boxes occur on either side of A1BG and Google Scholar search using "E2 box" "X box" yields about 29 results (0.09 sec): articles contain one or the other but not both to the same gene rules out interaction.


 * Y box: no Y boxes occur on either side of A1BG and Google Scholar search using "E2 box" "Y box" yields about 33 results (0.15 sec): articles contain one or the other but not both to the same gene rules out interaction.

Conclusions
Hypothesis 1: E2 boxes are not present in the promoter of A1BG is true for the promoter between ZNF497 and A1BG. But, it is false regarding the promoter between ZSCAN22 and A1BG. Hypothesis 2: if an E2 box is present it does not assist in the transcription of A1BG is true between ZNF497 and A1BG as no E2 box is present to assist in transcription. Hypothesis 2 is false between ZSCAN22 and A1BG as E2 boxes exist and may interact with an AGC (GCC) box, an ATA box, C boxes, a D box, but the other C-box and D-box have not been tested, CAREs, CArG boxes, a CRE box, enhancer boxes, a BREu, HNF6s, HY boxes, an MRE, pyrimidine boxes, STAT5s, TATA boxes outside the core promoter, a TAT box, or W boxes. Hypothesis 2 is true between ZSCAN22 and A1BG as E2 boxes exist but no CAAT box, CENP-B box, CGCG boxes are too close to ZSCAN22, no DREB box, EIF4E basal element, GARE are too close to ZSCAN22, no G box, GLM box, MTE, TATCCAC box, X box, or Y box occur.

Laboratory evaluations
To assess your example, including your justification, analysis and discussion, I will provide such an assessment of my example for comparison and consideration.

Evaluation

No wet chemistry experiments were performed to confirm that Gene ID: 1 may be transcribed from either side using transcription factors in the core, proximal or distal promoters. The NCBI Gene database is generalized, whereas individual human genome testing could demonstrate that A1BG is transcribed from either side using known transcription factors. Sufficient nucleotides have been added to the data sets for the ZNF497 side to confirm likely transcription of A1BG by these known transcription factors.