H box gene transcription laboratory

A laboratory is a specialized activity, a construct, you create where you as a student, teacher, or researcher can have hands-on, or as close to hands-on as possible, experience actively analyzing an entity, source, or object of interest. Usually, there's more to do than just analyzing. The construct is often a room, building or institution equipped for scientific research, experimentation as well as analysis.

This laboratory is a continuation of the previous laboratory.

In the room next door is an astronaut on the Mars expedition, three months along on the six-month journey. A physician and lab assistants have been performing tests on her. Because she has been in zero gravity for more than three months her body chemistry and anatomy now differ from what it was in the controlled gravity environment of Earth. She has lost about 10 % each of her bone, muscle, and brain mass. Comparisons with gene expression sequences now and when on Earth have found that the gene expression for alpha-1-B glycoprotein is not normal. If a way to correct this expression cannot be found she must be returned to Earth maybe to recover, maybe not!

But, it is unlikely she will survive three more months at zero g either to be returned to Earth or put on Mars. Worse, the microgravity may not be the only culprit. There is also the radiation of the interplanetary medium.

You have been tasked to examine her DNA to confirm, especially with the extended data between ZNF497 and A1BG, the presence or absence of H boxes regarding the possible expression of alpha-1-B glycoprotein.

Consensus sequences
"The box H/ACA snoRNAs were most recently recognized as a small RNA family by virtue of an ACA trinucleotide located 3 nt upstream of the mature snoRNA 3' end (41). In addition to this ACA box, they have the consensus H box sequence (5'-ANANNA-3') but have no other primary sequence identity. Despite this lack of primary sequence conservation, the H and ACA boxes are embedded in an evolutionarily conserved hairpin-hinge-hairpin-tail core secondary structure with the H box in the single-stranded hinge region and the ACA box in the single-stranded tail (5, 16)."

The "3' end of mature [human telomerase] hTR (45) has an ACA trinucleotide 3 nt upstream of its 3' end. In addition, the 3' region of hTR contains a single H box consensus sequence (5'-AGAGGA-3')."

"Comparison with the murine telomerase RNA (mTR) (7) suggests that the snoRNA-like features of hTR are evolutionarily conserved. The mTR 3' end (nt 169 to 397 as numbered in reference 25) has ~76% sequence identity with the corresponding region of hTR (nt 211 to 451) and includes consensus H (5'-ACAGGA-3') and ACA box sequences."

An H box has a consensus sequence of 3'-ACACCA-5'.

H box in Solanaceae has the following consensus sequence 3'-CC(A/T)ACCNNNNNNN(A/C)T-5'.

"The KAP-2 protein [...] binds to the H-box (CCTACC) element in the bean CHS15 chalcone synthase promoter". "In vitro transcription assays confirmed that KAP-2 stimulates transcription from a promoter harboring the H-box cis element."

"The G-box and H-box in a 39 bp region of a French bean chalcone synthase promoter constitute a tissue-specific regulatory element."

"The CHS promoter contains the nucleotide sequence CACGTG regulatory motif known as G-box, which has been found to be important in the response to light/UV light (Kaulen et al. 1986; Staiger et al. 1989; Dixon et al. 1994; Schulze et al. 1989). Besides the G-box there are other domains in the CHS promoter involved in the light activation of CHS transcription. Those domains have been identified in the parsley CHS promoter as Box I, Box II, Box III, Box IV or three copies of H-box (CCTACC) in the Phaseolus vulgaris CHS15 promoter. These boxes play a role as core promoter together with the G-box and all are required for light inducibility (Block et al. 1990; Lawton et al. 1990; Weisshaar et al. 1991)."

In the image on the right, the bean CHS15 promoter and regulators, where SBF silencer binding factor, H H-Box (CCTACC), G G-Box (CACGTG), a/a2 regulation loci are labeled and nucleotides relative to the Transcription start site (+1) are indicated.

The chalcone synthase gene of Petunia plants is famous for being the first gene in which the phenomenon of RNA interference was observed; researchers intending to upregulate the production of pigments in light pink or violet flowers introduced a transgene for chalcone synthase, expecting that both the native gene and the transgene would express the enzyme and result in a more deeply colored flower phenotype. Instead the transgenic plants had mottled white flowers, indicating that the introduction of the transgene had downregulated or silenced chalcone synthase expression. Further investigation of the phenomenon indicated that the downregulation was due to post-transcriptional inhibition of the chalcone synthase gene expression via an increased rate of messenger RNA degradation.

Nucleotides
DNA mapping has been performed. Her DNA for A1BG promoters can be found at Gene_transcriptions/A1BG.

Programming
Sample programs for preparing test programs are available at Gene transcriptions/A1BG/Programming.

Hypotheses

 * 1) A1BG is not transcribed by an H box.
 * 2) If an H box is present at least one transcription factor uses the H box to affect A1BG transcription.

Core promoters
The core promoter is approximately -34 nts upstream from the TSS.

From the first nucleotide just after ZSCAN22 to the first nucleotide just before A1BG are 4460 nucleotides. The core promoter on this side of A1BG extends from approximately 4425 to the possible transcription start site at nucleotide number 4460.

To extend the analysis from inside and just on the other side of ZNF497 some 3340 nts have been added to the data. This would place the core promoter some 3340 nts further away from the other side of ZNF497. The TSS would be at about 4300 nts with the core promoter starting at 4266.

Def. "the factors, including RNA polymerase II itself, that are minimally essential for transcription in vitro from an isolated core promoter" is called the basal machinery, or basal transcription machinery.

"The core promoter in human genes is the region from −40 to +40 and flanks the transcription start site (TSS) at +1. Although no single core promoter element is contained in all human promoters, many contain one or more of the following core elements [...]: the TATA box, initiator (Inr), TFIIB recognition elements (BREu and BREd), polypyrimidine initiator (TCT), motif ten element (MTE), and downstream core promoter element (DPE) [...]. Of these, the Inr element encompasses the TSS and is thought to be the most common core promoter element, with previous studies estimating that ∼50% of human core promoters contain an Inr (Gershenzon and Ioshikhes 2005; Yang et al. 2007). The commonly used consensus sequence for the human Inr, which was derived from mutational analyses, is YYANWYY from −2 to +5 (where, Y = C/T, W = A/T, N=A/C/G/T, and +1 is [A)] (Javahery et al. 1994; Lo and Smale 1996)."

"Kadonaga and colleagues (Vo ngoc et al. 2017) devised and implemented a novel multistep approach that combines experimental and computational methods to reinvestigate the human Inr consensus sequence. First, they generated two 5′-GRO-seq (5′ end-selected global run-on followed by sequencing) libraries with human MCF-7 cells to identify the 5′ ends of nascent capped transcripts. Second, they developed a peak-calling algorithm named FocusTSS to find transcripts in the 5′-GRO-seq data sets that were initiated at a focused position on the genome, hence identifying clear TSSs to enable analysis of Inr sequences. FocusTSS identified 7678 TSSs that were in both data sets. Third, to identify sequence motifs enriched among the focused TSSs, they used the HOMER motif discovery tool (Heinz et al. 2010), which yielded an Inr-like consensus sequence of BBCABW from −3 to +3 (where, B = C/G/T, W = A/T, and +1 is [A]). Forty percent of the focused TSSs contained a perfect match to the BBCABW consensus Inr."

The second image down on the right shows relative "locations of select human core promoter elements and the Inr consensus sequence found in promoters with focused TSSs. The promoter elements depicted include BREu (the upstream TFIIB recognition element), TATA (the TATA box), BREd (the downstream TFIIB recognition element), Inr (new consensus sequence shown), MTE, and DPE."

Proximal promoters
Def. a "promoter region [juxtaposed to the core promoter that] binds transcription factors that modify the affinity of the core promoter for RNA polymerase.[12][13]" is called a proximal promoter.

The proximal sequence upstream of the gene that tends to contain primary regulatory elements is a proximal promoter.

It is approximately 250 base pairs or nucleotides, nts, upstream of the transcription start site.

The proximal promoter begins about nucleotide number 4210 in the negative direction.

The proximal promoter begins about nucleotide number 4195 in the positive direction.

Distal promoters
The "upstream regions of the human [cytochrome P450 family 11 subfamily A] CYP11A and bovine CYP11B genes [have] a distal promoter in each gene. The distal promoters are located at −1.8 to −1.5 kb in the upstream region of the CYP11A gene and −1.5 to −1.1 kb in the upstream region of the CYP11B gene."

"Using cloned chicken βA-globin genes, either individually or within the natural chromosomal locus, enhancer-dependent transcription is achieved in vitro at a distance of 2 kb with developmentally staged erythroid extracts. This occurs by promoter derepression and is critically dependent upon DNA topology. In the presence of the enhancer, genes must exist in a supercoiled conformation to be actively transcribed, whereas relaxed or linear templates are inactive. Distal protein–protein interactions in vitro may be favored on supercoiled DNA because of topological constraints."

Distal promoter regions may be a relatively small number of nucleotides, fairly close to the TSS such as (-253 to -54) or several regions of different lengths, many nucleotides away, such as (-2732 to -2600) and (-2830 to -2800).

The "[d]istal promoter is not a spacer element."

Using an estimate of 2 knts, a distal promoter to A1BG would be expected after nucleotide number 2460.

Any transcription factors before A1BG from the direction of ZN497 may be out to 2300 nts.

Regarding hypothesis 1
Hypothesis 1: A1BG is not transcribed by an H box.

For the Basic programs testing consensus sequence 3'-ACACCA-5' (starting with SuccessablesHbox.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
 * 1) negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHbox--.bas, looking for 3'-ACACCA-5', 4, 3'-ACACCA-5', 788, 3'-ACACCA-5', 2659, 3'-ACACCA-5', 3187, 3'-ACACCA-5', 3811.
 * 2) negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHbox-+.bas, looking for 3'-ACACCA-5', 2, 3'-ACACCA-5', 2603, 3'-ACACCA-5', 3825.
 * 3) positive strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHbox+-.bas, looking for 3'-ACACCA-5', 2, 3'-ACACCA-5', 883, 3'-ACACCA-5', 2419.
 * 4) positive strand in the positive direction (from ZSCAN22 to A1BG) is SuccessablesHbox++.bas, looking for 3'-ACACCA-5', 2, 3'-ACACCA-5', 3643, 3'-ACACCA-5', 3967.
 * 5) complement, negative strand, negative direction is SuccessablesHboxc--.bas, looking for 3'-TGTGGT-5', 2, 3'-TGTGGT-5', 883, 3'-TGTGGT-5', 2419.
 * 6) complement, negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHboxc-+.bas, looking for 3'-TGTGGT-5', 2, 3'-TGTGGT-5', 3643 , 3'-TGTGGT-5', 3967.
 * 7) complement, positive strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHboxc+-.bas, looking for 3'-TGTGGT-5', 4, 3'-TGTGGT-5', 788, 3'-TGTGGT-5', 2659, 3'-TGTGGT-5', 3187, 3'-TGTGGT-5', 3811.
 * 8) complement, positive strand in the positive direction (from ZSCAN22 to A1BG) is SuccessablesHboxc++.bas, looking for 3'-ACACCA-5', 2, 3'-TGTGGT-5', 2603, 3'-TGTGGT-5', 3825.
 * 9) inverse complement, negative strand, negative direction is SuccessablesHboxci--.bas, looking for 3'-TGGTGT-5', 1, 3'-TGGTGT-5', 3764.
 * 10) inverse complement, negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHboxci-+.bas, looking for 3'-TGGTGT-5', 4, 3'-TGGTGT-5', 105, 3'-TGGTGT-5', 2813, 3'-TGGTGT-5', 3950, 3'-TGGTGT-5', 3969.
 * 11) inverse complement, positive strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHboxci+-.bas, looking for 3'-TGGTG-5', 3, 3'-TGGTGT-5', 608, 3'-TGGTGT-5', 793, 3'-TGGTGT-5', 1477.
 * 12) inverse complement, positive strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHboxci++.bas, looking for 3'-TGGTGT-5', 4, 3'-TGGTGT-5', 2123, 3'-TGGTGT-5', 2600 , 3'-TGGTGT-5', 2634 , 3'-TGGTGT-5', 3859.
 * 13) inverse negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHboxi--.bas, looking for 3'-ACCACA-5', 3, 3'-ACCACA-5', 608, 3'-ACCACA-5', 793, 3'-ACCACA-5', 1477.
 * 14) inverse negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHboxi-+.bas, looking for 3'-ACCACA-5', 4, 3'-ACCACA-5', 2123, 3'-ACCACA-5', 2600, 3'-ACCACA-5', 2634, 3'-ACCACA-5', 3859.
 * 15) inverse positive strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHboxi+-.bas, looking for 3'-ACCACA-5', 1, 3'-ACCACA-5', 3764.
 * 16) inverse positive strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHboxi++.bas, looking for 3'-ACCACA-5', 4, 3'-ACCACA-5', 105, 3'-ACCACA-5', 2813, 3'-ACCACA-5', 3950, 3'-ACCACA-5', 3969.

For the Basic programs testing consensus sequence 3'-AGAGGA-5' (starting with SuccessablesHbox2.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
 * 1) negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHbox2--.bas, looking for 3'-AGAGGA-5', 0.
 * 2) negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHbox2-+.bas, looking for 3'-AGAGGA-5', 3, 3'-AGAGGA-5', 207, 3'-AGAGGA-5', 471, 3'-AGAGGA-5', 2793.
 * 3) positive strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHbox2+-.bas, looking for 3'-AGAGGA-5', 3, 3'-AGAGGA-5', 3387, 3'-AGAGGA-5', 3638, 3'-AGAGGA-5', 3675.
 * 4) positive strand in the positive direction (from ZSCAN22 to A1BG) is SuccessablesHbox2++.bas, looking for 3'-AGAGGA-5', 4, 3'-AGAGGA-5', 142, 3'-AGAGGA-5', 2081, 3'-AGAGGA-5', 3302, 3'-AGAGGA-5', 4059.
 * 5) complement, negative strand, negative direction is SuccessablesHbox2c--.bas, looking for 3'-TCTCCT-5', 3, 3'-TCTCCT-5', 3387, 3'-TCTCCT-5', 3638, 3'-TCTCCT-5', 3675.
 * 6) complement, negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHbox2c-+.bas, looking for 3'-TCTCCT-5', 4, 3'-TCTCCT-5', 142, 3'-TCTCCT-5', 2081, 3'-TCTCCT-5', 3302, 3'-TCTCCT-5', 4059.
 * 7) complement, positive strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHbox2c+-.bas, looking for 3'-TCTCCT-5', 0.
 * 8) complement, positive strand in the positive direction (from ZSCAN22 to A1BG) is SuccessablesHbox2c++.bas, looking for 3'-TCTCCT-5', 3, 3'-TCTCCT-5', 207, 3'-TCTCCT-5', 471, 3'-TCTCCT-5', 2793.
 * 9) inverse complement, negative strand, negative direction is SuccessablesHbox2ci--.bas, looking for 3'-TCCTCT-5', 7, 3'-TCCTCT-5', 581, 3'-TCCTCT-5', 834, 3'-TCCTCT-5', 1000, 3'-TCCTCT-5', 1291, 3'-TCCTCT-5', 1826, 3'-TCCTCT-5', 1944, 3'-TCCTCT-5', 2370.
 * 10) inverse complement, negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHbox2ci-+.bas, looking for 3'-TCCTCT-5', 3, 3'-TCCTCT-5', 221, 3'-TCCTCT-5', 2981, 3'-TCCTCT-5', 3304.
 * 11) inverse complement, positive strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHbox2ci+-.bas, looking for 3'-TCCTCT-5', 2, 3'-TCCTCT-5', 3790, 3'-TCCTCT-5', 4428.
 * 12) inverse complement, positive strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHbox2ci++.bas, looking for 3'-TCCTCT-5', 3, 3'-TCCTCT-5', 46, 3'-TCCTCT-5', 710, 3'-TCCTCT-5', 3650.
 * 13) inverse negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHbox2i--.bas, looking for 3'-AGGAGA-5', 2, 3'-AGGAGA-5', 3790, 3'-AGGAGA-5', 4428.
 * 14) inverse negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHbox2i-+.bas, looking for 3'-AGGAGA-5', 3, 3'-AGGAGA-5', 46, 3'-AGGAGA-5', 710, 3'-AGGAGA-5', 3650.
 * 15) inverse positive strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHbox2i+-.bas, looking for 3'-AGGAGA-5', 7, 3'-AGGAGA-5', 581, 3'-AGGAGA-5', 834, 3'-AGGAGA-5', 1000, 3'-AGGAGA-5', 1291, 3'-AGGAGA-5', 1826, 3'-AGGAGA-5', 1944, 3'-AGGAGA-5', 2370.
 * 16) inverse positive strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHbox2i++.bas, looking for 3'-AGGAGA-5', 3, 3'-AGGAGA-5', 221 , 3'-AGGAGA-5', 2981 , 3'-AGGAGA-5', 3304.

For the Basic programs testing consensus sequence 3'-ANANNA-5' (starting with SuccessablesHbox3.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
 * 1) negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHbox3--.bas, looking for 3'-ANANNA-5', 64, 3'-AAAGAA-5', 24, 3'-AGAAAA-5', 26, 3'-ATACAA-5', 43, 3'-ACAAGA-5', 45, 3'-ATACAA-5', 213, 3'-ACAAAA-5', 215, 3'-AAAATA-5', 218, 3'-AGATAA-5', 235, 3'-ACATTA-5', 248, 3'-AGAACA-5', 281, 3'-ACATGA-5', 325, 3'-ATACTA-5', 352, 3'-ACATTA-5', 397, 3'-ATACCA-5', 606, 3'-AGATGA-5', 625, 3'-ACATTA-5', 670, 3'-AGATGA-5', 759, 3'-ACACCA-5', 788, 3'-ACATTA-5', 804, 3'-ACATTA-5', 1134, 3'-ACATTA-5', 1261, 3'-AGAAAA-5', 1419, 3'-AAAAAA-5', 1421, 3'-AAAAAA-5', 1422, 3'-AAAAAA-5', 1423, 3'-AAAAAA-5', 1424, 3'-AAAAAA-5', 1425, 3'-AAAAAA-5', 1426, 3'-AAAAAA-5', 1427, 3'-AAAAAA-5', 1428, 3'-AAAAAA-5', 1429, 3'-AAAAAA-5', 1430, 3'-AAAAAA-5', 1431, 3'-AAAAAA-5', 1432, 3'-AGACAA-5', 1453, 3'-ACACTA-5', 1480, 3'-ATATAA-5', 1601, 3'-AAAGAA-5', 1605, 3'-ATAAAA-5', 1727, 3'-AAAATA-5', 1729, 3'-ATAGAA-5', 1732, 3'-ACATTA-5', 1779, 3'-AGATGA-5', 1868, 3'-ACATTA-5', 1914, 3'-ACATTA-5', 2088, 3'-AGATGA-5', 2170, 3'-AGATGA-5', 2295, 3'-ACATCA-5', 2340, 3'-ACATCA-5', 2541, 3'-ACACCA-5', 2659, 3'-ACATTA-5', 2675, 3'-ATAAAA-5', 2853, 3'-AAAGTA-5', 2886, 3'-ACATTA-5', 3064, 3'-AGATGA-5', 3159, 3'-ACACCA-5', 3187, 3'-AGAAGA-5', 3554, 3'-AGACGA-5', 3707, 3'-ACACCA-5', 3811, 3'-ACATTA-5', 3973, 3'-ACATCA-5', 4124, 3'-ACACGA-5', 4402, 3'-AGAGAA-5', 4527, 3'-AAATAA-5', 4537.
 * 2) negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHbox3-+.bas, looking for 3'-ANANNA-5', 32, 3'-AGAAGA-5', 49, 3'-AGAGGA-5', 207, 3'-AGAGGA-5', 471, 3'-ACAGCA-5', 1055, 3'-AAAGCA-5', 2006, 3'-ACATGA-5', 2141, 3'-AGATCA-5', 2231, 3'-ATACCA-5', 2591, 3'-ACACCA-5', 2603, 3'-ATAGAA-5', 2628, 3'-AAACCA-5', 2632, 3'-ACACTA-5', 2637, 3'-ATATAA-5', 2662, 3'-AGAGCA-5', 2704, 3'-AGAGGA-5', 2793, 3'-AAAGGA-5', 2829, 3'-ACAGAA-5', 2838, 3'-AAAGAA-5', 3066, 3'-AGAACA-5', 3094, 3'-AGAGCA-5', 3138, 3'-ACAGCA-5', 3212, 3'-ACAGTA-5', 3414, 3'-AGATGA-5', 3476, 3'-ACAGGA-5', 3572, 3'-AAAGCA-5', 3599, 3'-ACATGA-5', 3708, 3'-ACACCA-5', 3825, 3'-AAAAGA-5', 3929, 3'-AGAACA-5', 4068, 3'-AAATGA-5', 4094, 3'-ACATCA-5', 4116, 3'-ACATGA-5', 4154.
 * 3) positive strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHbox3+-.bas, looking for 3'-ANANNA-5', 263, 3'-AGAAAA-5', 53, 3'-AAAAGA-5', 55, 3'-AAAACA-5', 68, 3'-AAACAA-5', 69, 3'-ATAGAA-5', 101, 3'-AGAAAA-5', 103, 3'-AAAGGA-5', 106, 3'-ATACAA-5', 113, 3'-AAAACA-5', 166, 3'-ACATTA-5', 173, 3'-ATAAAA-5', 183, 3'-AAAGCA-5', 186, 3'-ATAAAA-5', 222, 3'-AAAAGA-5', 224, 3'-AAAGAA-5', 225, 3'-AAACAA-5', 229, 3'-AAACCA-5', 260, 3'-ACATAA-5', 269, 3'-ATAATA-5', 271, 3'-ATATGA-5', 274, 3'-AGAACA-5', 287, 3'-ACAGAA-5', 290, 3'-ATAGAA-5', 356, 3'-AGAAAA-5', 358, 3'-AAAACA-5', 360, 3'-ACAAAA-5', 485, 3'-AAAAAA-5', 487, 3'-AAAATA-5', 489, 3'-ATACGA-5', 492, 3'-AAATTA-5', 498, 3'-AGATCA-5', 590, 3'-AAAATA-5', 632, 3'-ATACAA-5', 635, 3'-ACAAAA-5', 637, 3'-AAAAAA-5', 639, 3'-AGACCA-5', 726, 3'-AAAATA-5', 766, 3'-ATACAA-5', 769, 3'-ACAAAA-5', 771, 3'-AAAAAA-5', 773, 3'-AAATTA-5', 776, 3'-AGATCA-5', 878, 3'-ACACCA-5', 883, 3'-AAAAAA-5', 928, 3'-AAAAAA-5', 929, 3'-AAAAAA-5', 930, 3'-AAAAAA-5', 931, 3'-AAAAAA-5', 932, 3'-AAAAAA-5', 933, 3'-AAAAAA-5', 934, 3'-AAAAAA-5', 935, 3'-AAAAAA-5', 936, 3'-AAAAAA-5', 937, 3'-AAAAAA-5', 938, 3'-AAAAAA-5', 939, 3'-AAAAAA-5', 940, 3'-AAAAAA-5', 941, 3'-AAAAAA-5', 942, 3'-ACAACA-5', 1071, 3'-AAAAAA-5', 1094, 3'-AAAAAA-5', 1095, 3'-AAAAAA-5', 1096, 3'-AAAAAA-5', 1097, 3'-AAAAAA-5', 1098, 3'-AAAAAA-5', 1099, 3'-AAAAAA-5', 1100, 3'-AAAAAA-5', 1101, 3'-AAAAAA-5', 1102, 3'-AAAAAA-5', 1103, 3'-AAAAAA-5', 1104, 3'-AAAAAA-5', 1105, 3'-ACAAAA-5', 1228, 3'-AAAAAA-5', 1230, 3'-AAATTA-5', 1233, 3'-ATAAGA-5', 1365, 3'-AGAGCA-5', 1368, 3'-AAAACA-5', 1387, 3'-AAACAA-5', 1388, 3'-AAACAA-5', 1392, 3'-ACAAAA-5', 1394, 3'-AAAAAA-5', 1396, 3'-AAAAAA-5', 1397, 3'-AAAAAA-5', 1398, 3'-AAAAGA-5', 1400, 3'-AAAGAA-5', 1550, 3'-AAAATA-5', 1563, 3'-AAATGA-5', 1580, 3'-AAACAA-5', 1585, 3'-AAAAGA-5', 1628, 3'-AAAGAA-5', 1629, 3'-AAAGGA-5', 1640, 3'-AAATGA-5', 1663, 3'-ATACCA-5', 1668, 3'-AAATGA-5', 1700, 3'-ATAGTA-5', 1705, 3'-AAAATA-5', 1739, 3'-AGACCA-5', 1835, 3'-AAAATA-5', 1875, 3'-ATACAA-5', 1878, 3'-ACAAAA-5', 1880, 3'-AAAAAA-5', 1882, 3'-AAAAAA-5', 1883, 3'-AAATTA-5', 1886, 3'-AGATCA-5', 1988, 3'-AGAGCA-5', 2020, 3'-AAAAAA-5', 2038, 3'-AAAAAA-5', 2039, 3'-AAAAAA-5', 2040, 3'-AAAAAA-5', 2041, 3'-AAAAAA-5', 2042, 3'-AAAAAA-5', 2043, 3'-AAAAAA-5', 2044, 3'-AAAAAA-5', 2045, 3'-AAAAAA-5', 2046, 3'-AAAAAA-5', 2047, 3'-AAAAAA-5', 2048, 3'-AAAAAA-5', 2049, 3'-AAAAAA-5', 2050, 3'-AAAAAA-5', 2051, 3'-AAAAGA-5', 2053, 3'-AAAGAA-5', 2054, 3'-AGAAAA-5', 2056, 3'-AAAAAA-5', 2058, 3'-AAAAAA-5', 2059, 3'-AAAAAA-5', 2060, 3'-AGACCA-5', 2122, 3'-AGACCA-5', 2146, 3'-ATACAA-5', 2180, 3'-ACAAAA-5', 2182, 3'-AAAAAA-5', 2184, 3'-AAATGA-5', 2187, 3'-AGACCA-5', 2262, 3'-ACAGCA-5', 2274, 3'-AAAATA-5', 2302, 3'-ATACAA-5', 2305, 3'-ACAAAA-5', 2307, 3'-AAAAAA-5', 2309, 3'-AAACTA-5', 2312, 3'-AGATCA-5', 2414, 3'-ACACCA-5', 2419, 3'-AAAAAA-5', 2461, 3'-AAAAAA-5', 2462, 3'-AAAAAA-5', 2463, 3'-AAAAAA-5', 2464, 3'-AAAAAA-5', 2465, 3'-AAAAAA-5', 2466, 3'-AAAAAA-5', 2467, 3'-AAAAAA-5', 2468, 3'-AAAAAA-5', 2469, 3'-AAAAAA-5', 2470, 3'-AAAGCA-5', 2473, 3'-AAAGCA-5', 2479, 3'-AAACAA-5', 2484, 3'-AAACAA-5', 2488, 3'-ACAAAA-5', 2490, 3'-ATAGTA-5', 2500, 3'-AGAAAA-5', 2506, 3'-AAAACA-5', 2508, 3'-AAACAA-5', 2509, 3'-AGACCA-5', 2599, 3'-ATACAA-5', 2642, 3'-ACAAAA-5', 2644, 3'-AAATCA-5', 2648, 3'-ACAGGA-5', 2690, 3'-AAATCA-5', 2749, 3'-AGAGCA-5', 2781, 3'-AAAAGA-5', 2798, 3'-AAAGAA-5', 2799, 3'-AAAGAA-5', 2803, 3'-AGAAAA-5', 2805, 3'-AAAAGA-5', 2807, 3'-AGAGAA-5', 2810, 3'-AGAAGA-5', 2812, 3'-AGAAAA-5', 2815, 3'-AAAAAA-5', 2817, 3'-AAAAGA-5', 2819, 3'-AAAGAA-5', 2820, 3'-AGAAAA-5', 2822, 3'-AAAAGA-5', 2824, 3'-AGAGAA-5', 2827, 3'-AGAAGA-5', 2829, 3'-AGAAAA-5', 2832, 3'-AAAAAA-5', 2834, 3'-AAAAGA-5', 2836, 3'-AAAGAA-5', 2837, 3'-AGAAAA-5', 2839, 3'-AAAACA-5', 2841, 3'-AAACAA-5', 2842, 3'-AAAATA-5', 2868, 3'-ATATAA-5', 2873, 3'-AAAAAA-5', 2929, 3'-ACATCA-5', 2941, 3'-ACATTA-5', 2951, 3'-AAACCA-5', 2971, 3'-AAAATA-5', 3012, 3'-AAATAA-5', 3013, 3'-AAAAAA-5', 3026, 3'-AAACTA-5', 3029, 3'-AGACCA-5', 3122, 3'-AAAACA-5', 3166, 3'-ACATAA-5', 3169, 3'-ATAAAA-5', 3171, 3'-AAATTA-5', 3175, 3'-AGATCA-5', 3277, 3'-ACAAGA-5', 3307, 3'-AGAGCA-5', 3310, 3'-AAAACA-5', 3329, 3'-AAACAA-5', 3330, 3'-AAATAA-5', 3334, 3'-AAACAA-5', 3338, 3'-ACAAGA-5', 3340, 3'-AGAAAA-5', 3343, 3'-AAACCA-5', 3365, 3'-AGAGGA-5', 3387, 3'-ACATCA-5', 3394, 3'-AGAGAA-5', 3406, 3'-ACATCA-5', 3415, 3'-ACATTA-5', 3436, 3'-ATATTA-5', 3454, 3'-ATATTA-5', 3468, 3'-AAACCA-5', 3484, 3'-AGATCA-5', 3489, 3'-AAAACA-5', 3511, 3'-ACACAA-5', 3514, 3'-ATAATA-5', 3538, 3'-ACAAGA-5', 3635, 3'-AGAGGA-5', 3638, 3'-AAAGAA-5', 3666, 3'-AGAACA-5', 3668, 3'-AGAGGA-5', 3675, 3'-ACAAGA-5', 3759, 3'-AGACCA-5', 3762, 3'-ACAAAA-5', 3767, 3'-AGAGCA-5', 3913, 3'-AGATGA-5', 3920, 3'-AGACCA-5', 4031, 3'-ACAAAA-5', 4066, 3'-AAAAAA-5', 4068, 3'-AAAATA-5', 4070, 3'-AAATAA-5', 4071, 3'-AAATAA-5', 4075, 3'-ATAATA-5', 4077, 3'-ATAGAA-5', 4080, 3'-AAAGAA-5', 4084, 3'-AGAAAA-5', 4086, 3'-AGACAA-5', 4182, 3'-ACAAAA-5', 4216, 3'-AAAAAA-5', 4218, 3'-AAAATA-5', 4220, 3'-AAATAA-5', 4221, 3'-ATAATA-5', 4223, 3'-AAAAAA-5', 4378, 3'-AAAAGA-5', 4380, 3'-AAAGAA-5', 4381, 3'-AGAAAA-5', 4383, 3'-AAAAAA-5', 4385, 3'-AAAAGA-5', 4387, 3'-AAAGAA-5', 4388, 3'-AGAAAA-5', 4390, 3'-AAAAGA-5', 4392, 3'-AAAGAA-5', 4393, 3'-AGAAAA-5', 4395, 3'-ACACGA-5', 4471.
 * 4) positive strand in the positive direction (from ZN497 to A1BG) is SuccessablesHbox3++.bas, looking for 3'-ANANNA-5', 42, 3'-AGACCA-5', 103, 3'-ACAAGA-5', 108, 3'-ACATAA-5', 115, 3'-ATAAGA-5', 117, 3'-AAAAGA-5', 137, 3'-AGAGGA-5', 142, 3'-ACAAAA-5', 147, 3'-AAAGCA-5', 1182, 3'-AGACGA-5', 1734, 3'-AAAGAA-5', 1980, 3'-AGAGGA-5', 2081, 3'-AGATCA-5', 2168, 3'-AGAGTA-5', 2175, 3'-ATAAGA-5', 2180, 3'-AGACAA-5', 2183, 3'-AGACAA-5', 2261, 3'-AAAGTA-5', 2265, 3'-AAAAGA-5', 2276, 3'-AAAGAA-5', 2277, 3'-AGAAAA-5', 2279, 3'-AAAAAA-5', 2281, 3'-AAATAA-5', 2347, 3'-AAAAAA-5', 2451, 3'-AAAACA-5', 2453, 3'-AGACGA-5', 2976, 3'-AGACCA-5', 3022, 3'-AGAGAA-5', 3056, 3'-AGAAGA-5', 3058, 3'-AGAGGA-5', 3302, 3'-AGACGA-5', 3307, 3'-ACAGAA-5', 3393, 3'-AGAAGA-5', 3395, 3'-ACAGGA-5', 3620, 3'-ACACCA-5', 3643, 3'-AAACCA-5', 3948, 3'-ACACCA-5', 3967, 3'-AGAGGA-5', 4059, 3'-AAAATA-5', 4122, 3'-AAATCA-5', 4137, 3'-AAATAA-5', 4142, 3'-ATATTA-5', 4168, 3'-AGAGAA-5', 4387.
 * 5) complement, negative strand, negative direction is SuccessablesHbox3c--.bas, looking for 3'-TNTNNT-5', 263, 3'-TCTTTT-5', 53, 3'-TTTTCT-5', 55, 3'-TTTTGT-5', 68, 3'-TTTGTT-5', 69, 3'-TATCTT-5', 101, 3'-TCTTTT-5', 103, 3'-TTTCCT-5', 106, 3'-TATGTT-5', 113, 3'-TTTTGT-5', 166, 3'-TGTAAT-5', 173, 3'-TATTTT-5', 183, 3'-TTTCGT-5', 186, 3'-TATTTT-5', 222, 3'-TTTTCT-5', 224, 3'-TTTCTT-5', 225, 3'-TTTGTT-5', 229, 3'-TTTGGT-5', 260, 3'-TGTATT-5', 269, 3'-TATTAT-5', 271, 3'-TATACT-5', 274, 3'-TCTTGT-5', 287, 3'-TGTCTT-5', 290, 3'-TATCTT-5', 356, 3'-TCTTTT-5', 358, 3'-TTTTGT-5', 360, 3'-TGTTTT-5', 485, 3'-TTTTTT-5', 487, 3'-TTTTAT-5', 489, 3'-TATGCT-5', 492, 3'-TTTAAT-5', 498, 3'-TCTAGT-5', 590, 3'-TTTTAT-5', 632, 3'-TATGTT-5', 635, 3'-TGTTTT-5', 637, 3'-TTTTTT-5', 639, 3'-TCTGGT-5', 726, 3'-TTTTAT-5', 766, 3'-TATGTT-5', 769, 3'-TGTTTT-5', 771, 3'-TTTTTT-5', 773, 3'-TTTAAT-5', 776, 3'-TCTAGT-5', 878, 3'-TGTGGT-5', 883, 3'-TTTTTT-5', 928, 3'-TTTTTT-5', 929, 3'-TTTTTT-5', 930, 3'-TTTTTT-5', 931, 3'-TTTTTT-5', 932, 3'-TTTTTT-5', 933, 3'-TTTTTT-5', 934, 3'-TTTTTT-5', 935, 3'-TTTTTT-5', 936, 3'-TTTTTT-5', 937, 3'-TTTTTT-5', 938, 3'-TTTTTT-5', 939, 3'-TTTTTT-5', 940, 3'-TTTTTT-5', 941, 3'-TTTTTT-5', 942, 3'-TGTTGT-5', 1071, 3'-TTTTTT-5', 1094, 3'-TTTTTT-5', 1095, 3'-TTTTTT-5', 1096, 3'-TTTTTT-5', 1097, 3'-TTTTTT-5', 1098, 3'-TTTTTT-5', 1099, 3'-TTTTTT-5', 1100, 3'-TTTTTT-5', 1101, 3'-TTTTTT-5', 1102, 3'-TTTTTT-5', 1103, 3'-TTTTTT-5', 1104, 3'-TTTTTT-5', 1105, 3'-TGTTTT-5', 1228, 3'-TTTTTT-5', 1230, 3'-TTTAAT-5', 1233, 3'-TATTCT-5', 1365, 3'-TCTCGT-5', 1368, 3'-TTTTGT-5', 1387, 3'-TTTGTT-5', 1388, 3'-TTTGTT-5', 1392, 3'-TGTTTT-5', 1394, 3'-TTTTTT-5', 1396, 3'-TTTTTT-5', 1397, 3'-TTTTTT-5', 1398, 3'-TTTTCT-5', 1400, 3'-TTTCTT-5', 1550, 3'-TTTTAT-5', 1563, 3'-TTTACT-5', 1580, 3'-TTTGTT-5', 1585, 3'-TTTTCT-5', 1628, 3'-TTTCTT-5', 1629, 3'-TTTCCT-5', 1640, 3'-TTTACT-5', 1663, 3'-TATGGT-5', 1668, 3'-TTTACT-5', 1700, 3'-TATCAT-5', 1705, 3'-TTTTAT-5', 1739, 3'-TCTGGT-5', 1835, 3'-TTTTAT-5', 1875, 3'-TATGTT-5', 1878, 3'-TGTTTT-5', 1880, 3'-TTTTTT-5', 1882, 3'-TTTTTT-5', 1883, 3'-TTTAAT-5', 1886, 3'-TCTAGT-5', 1988, 3'-TCTCGT-5', 2020, 3'-TTTTTT-5', 2038, 3'-TTTTTT-5', 2039, 3'-TTTTTT-5', 2040, 3'-TTTTTT-5', 2041, 3'-TTTTTT-5', 2042, 3'-TTTTTT-5', 2043, 3'-TTTTTT-5', 2044, 3'-TTTTTT-5', 2045, 3'-TTTTTT-5', 2046, 3'-TTTTTT-5', 2047, 3'-TTTTTT-5', 2048, 3'-TTTTTT-5', 2049, 3'-TTTTTT-5', 2050, 3'-TTTTTT-5', 2051, 3'-TTTTCT-5', 2053, 3'-TTTCTT-5', 2054, 3'-TCTTTT-5', 2056, 3'-TTTTTT-5', 2058, 3'-TTTTTT-5', 2059, 3'-TTTTTT-5', 2060, 3'-TCTGGT-5', 2122, 3'-TCTGGT-5', 2146, 3'-TATGTT-5', 2180, 3'-TGTTTT-5', 2182, 3'-TTTTTT-5', 2184, 3'-TTTACT-5', 2187, 3'-TCTGGT-5', 2262, 3'-TGTCGT-5', 2274, 3'-TTTTAT-5', 2302, 3'-TATGTT-5', 2305, 3'-TGTTTT-5', 2307, 3'-TTTTTT-5', 2309, 3'-TTTGAT-5', 2312, 3'-TCTAGT-5', 2414, 3'-TGTGGT-5', 2419, 3'-TTTTTT-5', 2461, 3'-TTTTTT-5', 2462, 3'-TTTTTT-5', 2463, 3'-TTTTTT-5', 2464, 3'-TTTTTT-5', 2465, 3'-TTTTTT-5', 2466, 3'-TTTTTT-5', 2467, 3'-TTTTTT-5', 2468, 3'-TTTTTT-5', 2469, 3'-TTTTTT-5', 2470, 3'-TTTCGT-5', 2473, 3'-TTTCGT-5', 2479, 3'-TTTGTT-5', 2484, 3'-TTTGTT-5', 2488, 3'-TGTTTT-5', 2490, 3'-TATCAT-5', 2500, 3'-TCTTTT-5', 2506, 3'-TTTTGT-5', 2508, 3'-TTTGTT-5', 2509, 3'-TCTGGT-5', 2599, 3'-TATGTT-5', 2642, 3'-TGTTTT-5', 2644, 3'-TTTAGT-5', 2648, 3'-TGTCCT-5', 2690, 3'-TTTAGT-5', 2749, 3'-TCTCGT-5', 2781, 3'-TTTTCT-5', 2798, 3'-TTTCTT-5', 2799, 3'-TTTCTT-5', 2803, 3'-TCTTTT-5', 2805, 3'-TTTTCT-5', 2807, 3'-TCTCTT-5', 2810, 3'-TCTTCT-5', 2812, 3'-TCTTTT-5', 2815, 3'-TTTTTT-5', 2817, 3'-TTTTCT-5', 2819, 3'-TTTCTT-5', 2820, 3'-TCTTTT-5', 2822, 3'-TTTTCT-5', 2824, 3'-TCTCTT-5', 2827, 3'-TCTTCT-5', 2829, 3'-TCTTTT-5', 2832, 3'-TTTTTT-5', 2834, 3'-TTTTCT-5', 2836, 3'-TTTCTT-5', 2837, 3'-TCTTTT-5', 2839, 3'-TTTTGT-5', 2841, 3'-TTTGTT-5', 2842, 3'-TTTTAT-5', 2868, 3'-TATATT-5', 2873, 3'-TTTTTT-5', 2929, 3'-TGTAGT-5', 2941, 3'-TGTAAT-5', 2951, 3'-TTTGGT-5', 2971, 3'-TTTTAT-5', 3012, 3'-TTTATT-5', 3013, 3'-TTTTTT-5', 3026, 3'-TTTGAT-5', 3029, 3'-TCTGGT-5', 3122, 3'-TTTTGT-5', 3166, 3'-TGTATT-5', 3169, 3'-TATTTT-5', 3171, 3'-TTTAAT-5', 3175, 3'-TCTAGT-5', 3277, 3'-TGTTCT-5', 3307, 3'-TCTCGT-5', 3310, 3'-TTTTGT-5', 3329, 3'-TTTGTT-5', 3330, 3'-TTTATT-5', 3334, 3'-TTTGTT-5', 3338, 3'-TGTTCT-5', 3340, 3'-TCTTTT-5', 3343, 3'-TTTGGT-5', 3365, 3'-TCTCCT-5', 3387, 3'-TGTAGT-5', 3394, 3'-TCTCTT-5', 3406, 3'-TGTAGT-5', 3415, 3'-TGTAAT-5', 3436, 3'-TATAAT-5', 3454, 3'-TATAAT-5', 3468, 3'-TTTGGT-5', 3484, 3'-TCTAGT-5', 3489, 3'-TTTTGT-5', 3511, 3'-TGTGTT-5', 3514, 3'-TATTAT-5', 3538, 3'-TGTTCT-5', 3635, 3'-TCTCCT-5', 3638, 3'-TTTCTT-5', 3666, 3'-TCTTGT-5', 3668, 3'-TCTCCT-5', 3675, 3'-TGTTCT-5', 3759, 3'-TCTGGT-5', 3762, 3'-TGTTTT-5', 3767, 3'-TCTCGT-5', 3913, 3'-TCTACT-5', 3920, 3'-TCTGGT-5', 4031, 3'-TGTTTT-5', 4066, 3'-TTTTTT-5', 4068, 3'-TTTTAT-5', 4070, 3'-TTTATT-5', 4071, 3'-TTTATT-5', 4075, 3'-TATTAT-5', 4077, 3'-TATCTT-5', 4080, 3'-TTTCTT-5', 4084, 3'-TCTTTT-5', 4086, 3'-TCTGTT-5', 4182, 3'-TGTTTT-5', 4216, 3'-TTTTTT-5', 4218, 3'-TTTTAT-5', 4220, 3'-TTTATT-5', 4221, 3'-TATTAT-5', 4223, 3'-TTTTTT-5', 4378, 3'-TTTTCT-5', 4380, 3'-TTTCTT-5', 4381, 3'-TCTTTT-5', 4383, 3'-TTTTTT-5', 4385, 3'-TTTTCT-5', 4387, 3'-TTTCTT-5', 4388, 3'-TCTTTT-5', 4390, 3'-TTTTCT-5', 4392, 3'-TTTCTT-5', 4393, 3'-TCTTTT-5', 4395, 3'-TGTGCT-5', 4471.
 * 6) complement, negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHbox3c-+.bas, looking for 3'-TNTNNT-5', 42, 3'-TCTGGT-5', 103, 3'-TGTTCT-5', 108, 3'-TGTATT-5', 115, 3'-TATTCT-5', 117, 3'-TTTTCT-5', 137, 3'-TCTCCT-5', 142, 3'-TGTTTT-5', 147, 3'-TTTCGT-5', 1182, 3'-TCTGCT-5', 1734, 3'-TTTCTT-5', 1980, 3'-TCTCCT-5', 2081, 3'-TCTAGT-5', 2168, 3'-TCTCAT-5', 2175, 3'-TATTCT-5', 2180, 3'-TCTGTT-5', 2183, 3'-TCTGTT-5', 2261, 3'-TTTCAT-5', 2265, 3'-TTTTCT-5', 2276, 3'-TTTCTT-5', 2277, 3'-TCTTTT-5', 2279, 3'-TTTTTT-5', 2281, 3'-TTTATT-5', 2347, 3'-TTTTTT-5', 2451, 3'-TTTTGT-5', 2453, 3'-TCTGCT-5', 2976, 3'-TCTGGT-5', 3022, 3'-TCTCTT-5', 3056, 3'-TCTTCT-5', 3058, 3'-TCTCCT-5', 3302, 3'-TCTGCT-5', 3307, 3'-TGTCTT-5', 3393, 3'-TCTTCT-5', 3395, 3'-TGTCCT-5', 3620, 3'-TGTGGT-5', 3643, 3'-TTTGGT-5', 3948, 3'-TGTGGT-5', 3967, 3'-TCTCCT-5', 4059, 3'-TTTTAT-5', 4122, 3'-TTTAGT-5', 4137, 3'-TTTATT-5', 4142, 3'-TATAAT-5', 4168, 3'-TCTCTT-5', 4387.
 * 7) complement, positive strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHbox3c+-.bas, looking for 3'-TNTNNT-5', 64, 3'-TTTCTT-5', 24, 3'-TCTTTT-5', 26, 3'-TATGTT-5', 43, 3'-TGTTCT-5', 45, 3'-TATGTT-5', 213, 3'-TGTTTT-5', 215, 3'-TTTTAT-5', 218, 3'-TCTATT-5', 235, 3'-TGTAAT-5', 248, 3'-TCTTGT-5', 281, 3'-TGTACT-5', 325, 3'-TATGAT-5', 352, 3'-TGTAAT-5', 397, 3'-TATGGT-5', 606, 3'-TCTACT-5', 625, 3'-TGTAAT-5', 670, 3'-TCTACT-5', 759, 3'-TGTGGT-5', 788, 3'-TGTAAT-5', 804, 3'-TGTAAT-5', 1134, 3'-TGTAAT-5', 1261, 3'-TCTTTT-5', 1419, 3'-TTTTTT-5', 1421, 3'-TTTTTT-5', 1422, 3'-TTTTTT-5', 1423, 3'-TTTTTT-5', 1424, 3'-TTTTTT-5', 1425, 3'-TTTTTT-5', 1426, 3'-TTTTTT-5', 1427, 3'-TTTTTT-5', 1428, 3'-TTTTTT-5', 1429, 3'-TTTTTT-5', 1430, 3'-TTTTTT-5', 1431, 3'-TTTTTT-5', 1432, 3'-TCTGTT-5', 1453, 3'-TGTGAT-5', 1480, 3'-TATATT-5', 1601, 3'-TTTCTT-5', 1605, 3'-TATTTT-5', 1727, 3'-TTTTAT-5', 1729, 3'-TATCTT-5', 1732, 3'-TGTAAT-5', 1779, 3'-TCTACT-5', 1868, 3'-TGTAAT-5', 1914, 3'-TGTAAT-5', 2088, 3'-TCTACT-5', 2170, 3'-TCTACT-5', 2295, 3'-TGTAGT-5', 2340, 3'-TGTAGT-5', 2541, 3'-TGTGGT-5', 2659, 3'-TGTAAT-5', 2675, 3'-TATTTT-5', 2853, 3'-TTTCAT-5', 2886, 3'-TGTAAT-5', 3064, 3'-TCTACT-5', 3159, 3'-TGTGGT-5', 3187, 3'-TCTTCT-5', 3554, 3'-TCTGCT-5', 3707, 3'-TGTGGT-5', 3811, 3'-TGTAAT-5', 3973, 3'-TGTAGT-5', 4124, 3'-TGTGCT-5', 4402, 3'-TCTCTT-5', 4527, 3'-TTTATT-5', 4537.
 * 8) complement, positive strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHbox3c++.bas, looking for 3'-TNTNNT-5', 32, 3'-TCTTCT-5', 49, 3'-TCTCCT-5', 207, 3'-TCTCCT-5', 471, 3'-TGTCGT-5', 1055, 3'-TTTCGT-5', 2006, 3'-TGTACT-5', 2141, 3'-TCTAGT-5', 2231, 3'-TATGGT-5', 2591, 3'-TGTGGT-5', 2603, 3'-TATCTT-5', 2628, 3'-TTTGGT-5', 2632, 3'-TGTGAT-5', 2637, 3'-TATATT-5', 2662, 3'-TCTCGT-5', 2704, 3'-TCTCCT-5', 2793, 3'-TTTCCT-5', 2829, 3'-TGTCTT-5', 2838, 3'-TTTCTT-5', 3066, 3'-TCTTGT-5', 3094, 3'-TCTCGT-5', 3138, 3'-TGTCGT-5', 3212, 3'-TGTCAT-5', 3414, 3'-TCTACT-5', 3476, 3'-TGTCCT-5', 3572, 3'-TTTCGT-5', 3599, 3'-TGTACT-5', 3708, 3'-TGTGGT-5', 3825, 3'-TTTTCT-5', 3929, 3'-TCTTGT-5', 4068, 3'-TTTACT-5', 4094, 3'-TGTAGT-5', 4116, 3'-TGTACT-5', 4154.
 * 9) inverse complement, negative strand, negative direction is SuccessablesHbox3ci--.bas, looking for 3'-TNNTNT-5', 270, 3'-TTGTCT-5', 13, 3'-TCTTTT-5', 53, 3'-TTTTCT-5', 55, 3'-TTCTAT-5', 57, 3'-TTTTGT-5', 68, 3'-TCCTAT-5', 74, 3'-TCTTTT-5', 103, 3'-TCCTAT-5', 108, 3'-TACTTT-5', 126, 3'-TTCTTT-5', 135, 3'-TTTTGT-5', 166, 3'-TTGTCT-5', 168, 3'-TATTTT-5', 183, 3'-TATTTT-5', 222, 3'-TTTTCT-5', 224, 3'-TTCTTT-5', 226, 3'-TATTAT-5', 271, 3'-TCTTGT-5', 287, 3'-TTGTCT-5', 289, 3'-TAGTGT-5', 295, 3'-TGCTTT-5', 312, 3'-TACTAT-5', 353, 3'-TCTTTT-5', 358, 3'-TTTTGT-5', 360, 3'-TTCTGT-5', 422, 3'-TTGTAT-5', 467, 3'-TGTTTT-5', 485, 3'-TTTTTT-5', 487, 3'-TTTTAT-5', 489, 3'-TGCTTT-5', 494, 3'-TAGTGT-5', 528, 3'-TTCTGT-5', 559, 3'-TCCTCT-5', 581, 3'-TGATTT-5', 628, 3'-TTTTAT-5', 632, 3'-TTATGT-5', 634, 3'-TGTTTT-5', 637, 3'-TTTTTT-5', 639, 3'-TGATTT-5', 762, 3'-TTTTAT-5', 766, 3'-TTATGT-5', 768, 3'-TGTTTT-5', 771, 3'-TTTTTT-5', 773, 3'-TCCTCT-5', 834, 3'-TAGTGT-5', 880, 3'-TTGTCT-5', 907, 3'-TTTTTT-5', 928, 3'-TTTTTT-5', 929, 3'-TTTTTT-5', 930, 3'-TTTTTT-5', 931, 3'-TTTTTT-5', 932, 3'-TTTTTT-5', 933, 3'-TTTTTT-5', 934, 3'-TTTTTT-5', 935, 3'-TTTTTT-5', 936, 3'-TTTTTT-5', 937, 3'-TTTTTT-5', 938, 3'-TTTTTT-5', 939, 3'-TTTTTT-5', 940, 3'-TTTTTT-5', 941, 3'-TTTTTT-5', 942, 3'-TCCTCT-5', 1000, 3'-TGTTGT-5', 1071, 3'-TTGTCT-5', 1073, 3'-TTTTTT-5', 1094, 3'-TTTTTT-5', 1095, 3'-TTTTTT-5', 1096, 3'-TTTTTT-5', 1097, 3'-TTTTTT-5', 1098, 3'-TTTTTT-5', 1099, 3'-TTTTTT-5', 1100, 3'-TTTTTT-5', 1101, 3'-TTTTTT-5', 1102, 3'-TTTTTT-5', 1103, 3'-TTTTTT-5', 1104, 3'-TTTTTT-5', 1105, 3'-TGTTTT-5', 1228, 3'-TTTTTT-5', 1230, 3'-TCCTCT-5', 1291, 3'-TATTCT-5', 1365, 3'-TCGTTT-5', 1370, 3'-TTTTGT-5', 1387, 3'-TTGTTT-5', 1389, 3'-TTGTTT-5', 1393, 3'-TGTTTT-5', 1394, 3'-TTTTTT-5', 1396, 3'-TTTTTT-5', 1397, 3'-TTTTTT-5', 1398, 3'-TTTTCT-5', 1400, 3'-TTGTGT-5', 1541, 3'-TTTTAT-5', 1563, 3'-TTATGT-5', 1565, 3'-TACTTT-5', 1582, 3'-TTGTTT-5', 1586, 3'-TTCTAT-5', 1595, 3'-TCGTCT-5', 1614, 3'-TTTTCT-5', 1628, 3'-TTCTTT-5', 1630, 3'-TCCTTT-5', 1642, 3'-TACTAT-5', 1665, 3'-TGGTCT-5', 1670, 3'-TAATTT-5', 1697, 3'-TACTAT-5', 1702, 3'-TTATCT-5', 1710, 3'-TTTTAT-5', 1739, 3'-TCCTCT-5', 1826, 3'-TGATTT-5', 1871, 3'-TTTTAT-5', 1875, 3'-TTATGT-5', 1877, 3'-TGTTTT-5', 1880, 3'-TTTTTT-5', 1882, 3'-TTTTTT-5', 1883, 3'-TCCTCT-5', 1944, 3'-TTTTTT-5', 2038, 3'-TTTTTT-5', 2039, 3'-TTTTTT-5', 2040, 3'-TTTTTT-5', 2041, 3'-TTTTTT-5', 2042, 3'-TTTTTT-5', 2043, 3'-TTTTTT-5', 2044, 3'-TTTTTT-5', 2045, 3'-TTTTTT-5', 2046, 3'-TTTTTT-5', 2047, 3'-TTTTTT-5', 2048, 3'-TTTTTT-5', 2049, 3'-TTTTTT-5', 2050, 3'-TTTTTT-5', 2051, 3'-TTTTCT-5', 2053, 3'-TTCTTT-5', 2055, 3'-TCTTTT-5', 2056, 3'-TTTTTT-5', 2058, 3'-TTTTTT-5', 2059, 3'-TTTTTT-5', 2060, 3'-TGATTT-5', 2173, 3'-TTCTAT-5', 2177, 3'-TGTTTT-5', 2182, 3'-TTTTTT-5', 2184, 3'-TAGTGT-5', 2242, 3'-TGATTT-5', 2298, 3'-TTTTAT-5', 2302, 3'-TTATGT-5', 2304, 3'-TGTTTT-5', 2307, 3'-TTTTTT-5', 2309, 3'-TCCTCT-5', 2370, 3'-TAGTGT-5', 2416, 3'-TTGTCT-5', 2443, 3'-TTTTTT-5', 2461, 3'-TTTTTT-5', 2462, 3'-TTTTTT-5', 2463, 3'-TTTTTT-5', 2464, 3'-TTTTTT-5', 2465, 3'-TTTTTT-5', 2466, 3'-TTTTTT-5', 2467, 3'-TTTTTT-5', 2468, 3'-TTTTTT-5', 2469, 3'-TTTTTT-5', 2470, 3'-TCGTTT-5', 2475, 3'-TCGTTT-5', 2481, 3'-TTGTTT-5', 2485, 3'-TTGTTT-5', 2489, 3'-TGTTTT-5', 2490, 3'-TTCTTT-5', 2505, 3'-TCTTTT-5', 2506, 3'-TTTTGT-5', 2508, 3'-TTGTTT-5', 2510, 3'-TGATTT-5', 2635, 3'-TTATAT-5', 2639, 3'-TGTTTT-5', 2644, 3'-TTGTCT-5', 2778, 3'-TTTTCT-5', 2798, 3'-TTCTTT-5', 2800, 3'-TTCTTT-5', 2804, 3'-TCTTTT-5', 2805, 3'-TTTTCT-5', 2807, 3'-TTCTCT-5', 2809, 3'-TCTTCT-5', 2812, 3'-TTCTTT-5', 2814, 3'-TCTTTT-5', 2815, 3'-TTTTTT-5', 2817, 3'-TTTTCT-5', 2819, 3'-TTCTTT-5', 2821, 3'-TCTTTT-5', 2822, 3'-TTTTCT-5', 2824, 3'-TTCTCT-5', 2826, 3'-TCTTCT-5', 2829, 3'-TTCTTT-5', 2831, 3'-TCTTTT-5', 2832, 3'-TTTTTT-5', 2834, 3'-TTTTCT-5', 2836, 3'-TTCTTT-5', 2838, 3'-TCTTTT-5', 2839, 3'-TTTTGT-5', 2841, 3'-TTTTAT-5', 2868, 3'-TTATAT-5', 2870, 3'-TTGTCT-5', 2878, 3'-TAGTTT-5', 2890, 3'-TTTTTT-5', 2929, 3'-TAGTCT-5', 2946, 3'-TCCTTT-5', 2957, 3'-TCCTTT-5', 2967, 3'-TAATCT-5', 2979, 3'-TAATCT-5', 3000, 3'-TTTTAT-5', 3012, 3'-TTATTT-5', 3014, 3'-TTTTTT-5', 3026, 3'-TGATTT-5', 3162, 3'-TTTTGT-5', 3166, 3'-TTGTAT-5', 3168, 3'-TATTTT-5', 3171, 3'-TGCTCT-5', 3233, 3'-TGTTCT-5', 3307, 3'-TCGTTT-5', 3312, 3'-TTTTGT-5', 3329, 3'-TTGTTT-5', 3331, 3'-TTATTT-5', 3335, 3'-TGTTCT-5', 3340, 3'-TTCTTT-5', 3342, 3'-TCTTTT-5', 3343, 3'-TTCTTT-5', 3376, 3'-TTCTCT-5', 3380, 3'-TCCTGT-5', 3389, 3'-TAGTAT-5', 3420, 3'-TAATTT-5', 3438, 3'-TGATCT-5', 3463, 3'-TCATTT-5', 3481, 3'-TGGTCT-5', 3486, 3'-TCGTTT-5', 3497, 3'-TTTTGT-5', 3511, 3'-TTGTGT-5', 3513, 3'-TATTAT-5', 3538, 3'-TAGTCT-5', 3618, 3'-TGTTCT-5', 3635, 3'-TCTTGT-5', 3668, 3'-TTGTGT-5', 3670, 3'-TCCTGT-5', 3756, 3'-TGTTCT-5', 3759, 3'-TGGTGT-5', 3764, 3'-TGTTTT-5', 3767, 3'-TCGTGT-5', 3915, 3'-TACTTT-5', 3922, 3'-TTGTAT-5', 4045, 3'-TGTTTT-5', 4066, 3'-TTTTTT-5', 4068, 3'-TTTTAT-5', 4070, 3'-TTATTT-5', 4072, 3'-TATTAT-5', 4077, 3'-TTATCT-5', 4079, 3'-TTCTTT-5', 4085, 3'-TCTTTT-5', 4086, 3'-TTCTGT-5', 4181, 3'-TTGTGT-5', 4196, 3'-TGTTTT-5', 4216, 3'-TTTTTT-5', 4218, 3'-TTTTAT-5', 4220, 3'-TATTAT-5', 4223, 3'-TTTTTT-5', 4378, 3'-TTTTCT-5', 4380, 3'-TTCTTT-5', 4382, 3'-TCTTTT-5', 4383, 3'-TTTTTT-5', 4385, 3'-TTTTCT-5', 4387, 3'-TTCTTT-5', 4389, 3'-TCTTTT-5', 4390, 3'-TTTTCT-5', 4392, 3'-TTCTTT-5', 4394, 3'-TCTTTT-5', 4395, 3'-TCCTGT-5', 4468, 3'-TGCTCT-5', 4473, 3'-TTCTGT-5', 4507, 3'-TTGTCT-5', 4518.
 * 10) inverse complement, negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHbox3ci-+.bas, looking for 3'-TNNTNT-5', 55, 3'-TCGTGT-5', 80, 3'-TGGTGT-5', 105, 3'-TGTTCT-5', 108, 3'-TTCTTT-5', 110, 3'-TTGTAT-5', 114, 3'-TATTCT-5', 117, 3'-TTCTCT-5', 119, 3'-TTTTCT-5', 137, 3'-TTCTCT-5', 139, 3'-TCCTGT-5', 144, 3'-TGTTTT-5', 147, 3'-TCCTCT-5', 221, 3'-TCGTCT-5', 1393, 3'-TCGTCT-5', 1493, 3'-TTCTTT-5', 1981, 3'-TTCTCT-5', 1990, 3'-TCATCT-5', 2111, 3'-TGCTAT-5', 2157, 3'-TAGTGT-5', 2170, 3'-TCATAT-5', 2177, 3'-TATTCT-5', 2180, 3'-TTCTGT-5', 2182, 3'-TTTTCT-5', 2276, 3'-TTCTTT-5', 2278, 3'-TCTTTT-5', 2279, 3'-TTTTTT-5', 2281, 3'-TAATTT-5', 2440, 3'-TTTTTT-5', 2451, 3'-TTTTGT-5', 2453, 3'-TTCTTT-5', 2585, 3'-TGGTGT-5', 2813, 3'-TTCTGT-5', 2925, 3'-TGGTCT-5', 2941, 3'-TTCTGT-5', 2957, 3'-TCCTCT-5', 2981, 3'-TTGTCT-5', 3004, 3'-TTGTCT-5', 3053, 3'-TCTTCT-5', 3058, 3'-TGGTCT-5', 3245, 3'-TGGTCT-5', 3299, 3'-TCCTCT-5', 3304, 3'-TCTTCT-5', 3395, 3'-TTCTTT-5', 3397, 3'-TCCTGT-5', 3622, 3'-TGGTGT-5', 3950, 3'-TGGTGT-5', 3969, 3'-TGGTTT-5', 4108, 3'-TCATTT-5', 4119, 3'-TTTTAT-5', 4122, 3'-TGATTT-5', 4134, 3'-TAGTTT-5', 4139, 3'-TCATGT-5', 4365, 3'-TGGTCT-5', 4380, 3'-TTCTCT-5', 4386, 3'-TGCTGT-5', 4392.
 * 11) inverse complement, positive strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHbox3ci+-.bas, looking for 3'-TNNTNT-5', 50, 3'-TTCTTT-5', 25, 3'-TCTTTT-5', 26, 3'-TGTTCT-5', 45, 3'-TTCTTT-5', 47, 3'-TGTTTT-5', 215, 3'-TTTTAT-5', 218, 3'-TAATAT-5', 272, 3'-TCTTGT-5', 281, 3'-TTCTTT-5', 347, 3'-TAATAT-5', 603, 3'-TGGTGT-5', 608, 3'-TTCTCT-5', 622, 3'-TGGTGT-5', 793, 3'-TTCTTT-5', 1418, 3'-TCTTTT-5', 1419, 3'-TTTTTT-5', 1421, 3'-TTTTTT-5', 1422, 3'-TTTTTT-5', 1423, 3'-TTTTTT-5', 1424, 3'-TTTTTT-5', 1425, 3'-TTTTTT-5', 1426, 3'-TTTTTT-5', 1427, 3'-TTTTTT-5', 1428, 3'-TTTTTT-5', 1429, 3'-TTTTTT-5', 1430, 3'-TTTTTT-5', 1431, 3'-TTTTTT-5', 1432, 3'-TGGTGT-5', 1477, 3'-TGATCT-5', 1482, 3'-TTCTAT-5', 1525, 3'-TTATTT-5', 1726, 3'-TATTTT-5', 1727, 3'-TTTTAT-5', 1729, 3'-TTATCT-5', 1731, 3'-TCCTGT-5', 1911, 3'-TACTGT-5', 2163, 3'-TACTTT-5', 2216, 3'-TATTTT-5', 2853, 3'-TGCTAT-5', 2898, 3'-TGCTGT-5', 3265, 3'-TTCTGT-5', 3319, 3'-TCTTCT-5', 3554, 3'-TTCTGT-5', 3556, 3'-TGCTGT-5', 3709, 3'-TCCTCT-5', 3790, 3'-TGATGT-5', 3808, 3'-TGCTGT-5', 3957, 3'-TCATCT-5', 4058, 3'-TGCTCT-5', 4404, 3'-TCCTCT-5', 4428.
 * 12) inverse complement, positive strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHbox3ci++.bas, looking for 3'-TNNTNT-5', 40, 3'-TCCTCT-5', 46, 3'-TCTTCT-5', 49, 3'-TACTGT-5', 62, 3'-TCCTCT-5', 710, 3'-TGGTCT-5', 1631, 3'-TGCTAT-5', 1837, 3'-TGGTGT-5', 2123, 3'-TACTTT-5', 2146, 3'-TGGTCT-5', 2228, 3'-TACTGT-5', 2412, 3'-TGATGT-5', 2428, 3'-TCCTGT-5', 2460, 3'-TAATAT-5', 2548, 3'-TGGTGT-5', 2600, 3'-TCCTTT-5', 2623, 3'-TTATCT-5', 2627, 3'-TGGTGT-5', 2634, 3'-TTGTCT-5', 2652, 3'-TCGTTT-5', 2706, 3'-TCCTTT-5', 2831, 3'-TTGTGT-5', 2835, 3'-TACTGT-5', 2843, 3'-TCTTGT-5', 3094, 3'-TTGTGT-5', 3096, 3'-TCCTGT-5', 3131, 3'-TGGTTT-5', 3175, 3'-TTGTCT-5', 3179, 3'-TCGTCT-5', 3214, 3'-TCATCT-5', 3416, 3'-TTATTT-5', 3427, 3'-TGGTCT-5', 3548, 3'-TACTGT-5', 3569, 3'-TCCTCT-5', 3650, 3'-TCGTGT-5', 3740, 3'-TGGTGT-5', 3859, 3'-TTTTCT-5', 3929, 3'-TCTTGT-5', 4068, 3'-TAGTAT-5', 4149, 3'-TAATAT-5', 4166, 3'-TCCTGT-5', 4252.
 * 13) inverse negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHbox3i--.bas, looking for 3'-ANNANA-5', 50, 3'-AAGAAA-5', 25, 3'-AGAAAA-5', 26, 3'-ACAAGA-5', 45, 3'-AAGAAA-5', 47, 3'-ACAAAA-5', 215, 3'-AAAATA-5', 218, 3'-ATTATA-5', 272, 3'-AGAACA-5', 281, 3'-AAGAAA-5', 347, 3'-ATTATA-5', 603, 3'-ACCACA-5', 608, 3'-AAGAGA-5', 622, 3'-ACCACA-5', 793, 3'-AAGAAA-5', 1418, 3'-AGAAAA-5', 1419, 3'-AAAAAA-5', 1421, 3'-AAAAAA-5', 1422, 3'-AAAAAA-5', 1423, 3'-AAAAAA-5', 1424, 3'-AAAAAA-5', 1425, 3'-AAAAAA-5', 1426, 3'-AAAAAA-5', 1427, 3'-AAAAAA-5', 1428, 3'-AAAAAA-5', 1429, 3'-AAAAAA-5', 1430, 3'-AAAAAA-5', 1431, 3'-AAAAAA-5', 1432, 3'-ACCACA-5', 1477, 3'-ACTAGA-5', 1482, 3'-AAGATA-5', 1525, 3'-AATAAA-5', 1726, 3'-ATAAAA-5', 1727, 3'-AAAATA-5', 1729, 3'-AATAGA-5', 1731, 3'-AGGACA-5', 1911, 3'-ATGACA-5', 2163, 3'-ATGAAA-5', 2216, 3'-ATAAAA-5', 2853, 3'-ACGATA-5', 2898, 3'-ACGACA-5', 3265, 3'-AAGACA-5', 3319, 3'-AGAAGA-5', 3554, 3'-AAGACA-5', 3556, 3'-ACGACA-5', 3709, 3'-AGGAGA-5', 3790, 3'-ACTACA-5', 3808, 3'-ACGACA-5', 3957, 3'-AGTAGA-5', 4058, 3'-ACGAGA-5', 4404, 3'-AGGAGA-5', 4428.
 * 14) inverse negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHbox3i-+.bas, looking for 3'-ANNANA-5', 40, 3'-AGGAGA-5', 46, 3'-AGAAGA-5', 49, 3'-ATGACA-5', 62, 3'-AGGAGA-5', 710, 3'-ACCAGA-5', 1631, 3'-ACGATA-5', 1837, 3'-ACCACA-5', 2123, 3'-ATGAAA-5', 2146, 3'-ACCAGA-5', 2228, 3'-ATGACA-5', 2412, 3'-ACTACA-5', 2428, 3'-AGGACA-5', 2460, 3'-ATTATA-5', 2548, 3'-ACCACA-5', 2600, 3'-AGGAAA-5', 2623, 3'-AATAGA-5', 2627, 3'-ACCACA-5', 2634, 3'-AACAGA-5', 2652, 3'-AGCAAA-5', 2706, 3'-AGGAAA-5', 2831, 3'-AACACA-5', 2835, 3'-ATGACA-5', 2843, 3'-AGAACA-5', 3094, 3'-AACACA-5', 3096, 3'-AGGACA-5', 3131, 3'-ACCAAA-5', 3175, 3'-AACAGA-5', 3179, 3'-AGCAGA-5', 3214, 3'-AGTAGA-5', 3416, 3'-AATAAA-5', 3427, 3'-ACCAGA-5', 3548, 3'-ATGACA-5', 3569, 3'-AGGAGA-5', 3650, 3'-AGCACA-5', 3740, 3'-ACCACA-5', 3859, 3'-AAAAGA-5', 3929, 3'-AGAACA-5', 4068, 3'-ATCATA-5', 4149, 3'-ATTATA-5', 4166, 3'-AGGACA-5', 4252.
 * 15) inverse positive strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesHbox3i+-.bas, looking for 3'-ANNANA-5', 270, 3'-AACAGA-5', 13, 3'-AGAAAA-5', 53, 3'-AAAAGA-5', 55, 3'-AAGATA-5', 57, 3'-AAAACA-5', 68, 3'-AGGATA-5', 74, 3'-AGAAAA-5', 103, 3'-AGGATA-5', 108, 3'-ATGAAA-5', 126, 3'-AAGAAA-5', 135, 3'-AAAACA-5', 166, 3'-AACAGA-5', 168, 3'-ATAAAA-5', 183, 3'-ATAAAA-5', 222, 3'-AAAAGA-5', 224, 3'-AAGAAA-5', 226, 3'-ATAATA-5', 271, 3'-AGAACA-5', 287, 3'-AACAGA-5', 289, 3'-ATCACA-5', 295, 3'-ACGAAA-5', 312, 3'-ATGATA-5', 353, 3'-AGAAAA-5', 358, 3'-AAAACA-5', 360, 3'-AAGACA-5', 422, 3'-AACATA-5', 467, 3'-ACAAAA-5', 485, 3'-AAAAAA-5', 487, 3'-AAAATA-5', 489, 3'-ACGAAA-5', 494, 3'-ATCACA-5', 528, 3'-AAGACA-5', 559, 3'-AGGAGA-5', 581, 3'-ACTAAA-5', 628, 3'-AAAATA-5', 632, 3'-AATACA-5', 634, 3'-ACAAAA-5', 637, 3'-AAAAAA-5', 639, 3'-ACTAAA-5', 762, 3'-AAAATA-5', 766, 3'-AATACA-5', 768, 3'-ACAAAA-5', 771, 3'-AAAAAA-5', 773, 3'-AGGAGA-5', 834, 3'-ATCACA-5', 880, 3'-AACAGA-5', 907, 3'-AAAAAA-5', 928, 3'-AAAAAA-5', 929, 3'-AAAAAA-5', 930, 3'-AAAAAA-5', 931, 3'-AAAAAA-5', 932, 3'-AAAAAA-5', 933, 3'-AAAAAA-5', 934, 3'-AAAAAA-5', 935, 3'-AAAAAA-5', 936, 3'-AAAAAA-5', 937, 3'-AAAAAA-5', 938, 3'-AAAAAA-5', 939, 3'-AAAAAA-5', 940, 3'-AAAAAA-5', 941, 3'-AAAAAA-5', 942, 3'-AGGAGA-5', 1000, 3'-ACAACA-5', 1071, 3'-AACAGA-5', 1073, 3'-AAAAAA-5', 1094, 3'-AAAAAA-5', 1095, 3'-AAAAAA-5', 1096, 3'-AAAAAA-5', 1097, 3'-AAAAAA-5', 1098, 3'-AAAAAA-5', 1099, 3'-AAAAAA-5', 1100, 3'-AAAAAA-5', 1101, 3'-AAAAAA-5', 1102, 3'-AAAAAA-5', 1103, 3'-AAAAAA-5', 1104, 3'-AAAAAA-5', 1105, 3'-ACAAAA-5', 1228, 3'-AAAAAA-5', 1230, 3'-AGGAGA-5', 1291, 3'-ATAAGA-5', 1365, 3'-AGCAAA-5', 1370, 3'-AAAACA-5', 1387, 3'-AACAAA-5', 1389, 3'-AACAAA-5', 1393, 3'-ACAAAA-5', 1394, 3'-AAAAAA-5', 1396, 3'-AAAAAA-5', 1397, 3'-AAAAAA-5', 1398, 3'-AAAAGA-5', 1400, 3'-AACACA-5', 1541, 3'-AAAATA-5', 1563, 3'-AATACA-5', 1565, 3'-ATGAAA-5', 1582, 3'-AACAAA-5', 1586, 3'-AAGATA-5', 1595, 3'-AGCAGA-5', 1614, 3'-AAAAGA-5', 1628, 3'-AAGAAA-5', 1630, 3'-AGGAAA-5', 1642, 3'-ATGATA-5', 1665, 3'-ACCAGA-5', 1670, 3'-ATTAAA-5', 1697, 3'-ATGATA-5', 1702, 3'-AATAGA-5', 1710, 3'-AAAATA-5', 1739, 3'-AGGAGA-5', 1826, 3'-ACTAAA-5', 1871, 3'-AAAATA-5', 1875, 3'-AATACA-5', 1877, 3'-ACAAAA-5', 1880, 3'-AAAAAA-5', 1882, 3'-AAAAAA-5', 1883, 3'-AGGAGA-5', 1944, 3'-AAAAAA-5', 2038, 3'-AAAAAA-5', 2039, 3'-AAAAAA-5', 2040, 3'-AAAAAA-5', 2041, 3'-AAAAAA-5', 2042, 3'-AAAAAA-5', 2043, 3'-AAAAAA-5', 2044, 3'-AAAAAA-5', 2045, 3'-AAAAAA-5', 2046, 3'-AAAAAA-5', 2047, 3'-AAAAAA-5', 2048, 3'-AAAAAA-5', 2049, 3'-AAAAAA-5', 2050, 3'-AAAAAA-5', 2051, 3'-AAAAGA-5', 2053, 3'-AAGAAA-5', 2055, 3'-AGAAAA-5', 2056, 3'-AAAAAA-5', 2058, 3'-AAAAAA-5', 2059, 3'-AAAAAA-5', 2060, 3'-ACTAAA-5', 2173, 3'-AAGATA-5', 2177, 3'-ACAAAA-5', 2182, 3'-AAAAAA-5', 2184, 3'-ATCACA-5', 2242, 3'-ACTAAA-5', 2298, 3'-AAAATA-5', 2302, 3'-AATACA-5', 2304, 3'-ACAAAA-5', 2307, 3'-AAAAAA-5', 2309, 3'-AGGAGA-5', 2370, 3'-ATCACA-5', 2416, 3'-AACAGA-5', 2443, 3'-AAAAAA-5', 2461, 3'-AAAAAA-5', 2462, 3'-AAAAAA-5', 2463, 3'-AAAAAA-5', 2464, 3'-AAAAAA-5', 2465, 3'-AAAAAA-5', 2466, 3'-AAAAAA-5', 2467, 3'-AAAAAA-5', 2468, 3'-AAAAAA-5', 2469, 3'-AAAAAA-5', 2470, 3'-AGCAAA-5', 2475, 3'-AGCAAA-5', 2481, 3'-AACAAA-5', 2485, 3'-AACAAA-5', 2489, 3'-ACAAAA-5', 2490, 3'-AAGAAA-5', 2505, 3'-AGAAAA-5', 2506, 3'-AAAACA-5', 2508, 3'-AACAAA-5', 2510, 3'-ACTAAA-5', 2635, 3'-AATATA-5', 2639, 3'-ACAAAA-5', 2644, 3'-AACAGA-5', 2778, 3'-AAAAGA-5', 2798, 3'-AAGAAA-5', 2800, 3'-AAGAAA-5', 2804, 3'-AGAAAA-5', 2805, 3'-AAAAGA-5', 2807, 3'-AAGAGA-5', 2809, 3'-AGAAGA-5', 2812, 3'-AAGAAA-5', 2814, 3'-AGAAAA-5', 2815, 3'-AAAAAA-5', 2817, 3'-AAAAGA-5', 2819, 3'-AAGAAA-5', 2821, 3'-AGAAAA-5', 2822, 3'-AAAAGA-5', 2824, 3'-AAGAGA-5', 2826, 3'-AGAAGA-5', 2829, 3'-AAGAAA-5', 2831, 3'-AGAAAA-5', 2832, 3'-AAAAAA-5', 2834, 3'-AAAAGA-5', 2836, 3'-AAGAAA-5', 2838, 3'-AGAAAA-5', 2839, 3'-AAAACA-5', 2841, 3'-AAAATA-5', 2868, 3'-AATATA-5', 2870, 3'-AACAGA-5', 2878, 3'-ATCAAA-5', 2890, 3'-AAAAAA-5', 2929, 3'-ATCAGA-5', 2946, 3'-AGGAAA-5', 2957, 3'-AGGAAA-5', 2967, 3'-ATTAGA-5', 2979, 3'-ATTAGA-5', 3000, 3'-AAAATA-5', 3012, 3'-AATAAA-5', 3014, 3'-AAAAAA-5', 3026, 3'-ACTAAA-5', 3162, 3'-AAAACA-5', 3166, 3'-AACATA-5', 3168, 3'-ATAAAA-5', 3171, 3'-ACGAGA-5', 3233, 3'-ACAAGA-5', 3307, 3'-AGCAAA-5', 3312, 3'-AAAACA-5', 3329, 3'-AACAAA-5', 3331, 3'-AATAAA-5', 3335, 3'-ACAAGA-5', 3340, 3'-AAGAAA-5', 3342, 3'-AGAAAA-5', 3343, 3'-AAGAAA-5', 3376, 3'-AAGAGA-5', 3380, 3'-AGGACA-5', 3389, 3'-ATCATA-5', 3420, 3'-ATTAAA-5', 3438, 3'-ACTAGA-5', 3463, 3'-AGTAAA-5', 3481, 3'-ACCAGA-5', 3486, 3'-AGCAAA-5', 3497, 3'-AAAACA-5', 3511, 3'-AACACA-5', 3513, 3'-ATAATA-5', 3538, 3'-ATCAGA-5', 3618, 3'-ACAAGA-5', 3635, 3'-AGAACA-5', 3668, 3'-AACACA-5', 3670, 3'-AGGACA-5', 3756, 3'-ACAAGA-5', 3759, 3'-ACCACA-5', 3764, 3'-ACAAAA-5', 3767, 3'-AGCACA-5', 3915, 3'-ATGAAA-5', 3922, 3'-AACATA-5', 4045, 3'-ACAAAA-5', 4066, 3'-AAAAAA-5', 4068, 3'-AAAATA-5', 4070, 3'-AATAAA-5', 4072, 3'-ATAATA-5', 4077, 3'-AATAGA-5', 4079, 3'-AAGAAA-5', 4085, 3'-AGAAAA-5', 4086, 3'-AAGACA-5', 4181, 3'-AACACA-5', 4196, 3'-ACAAAA-5', 4216, 3'-AAAAAA-5', 4218, 3'-AAAATA-5', 4220, 3'-ATAATA-5', 4223, 3'-AAAAAA-5', 4378, 3'-AAAAGA-5', 4380, 3'-AAGAAA-5', 4382, 3'-AGAAAA-5', 4383, 3'-AAAAAA-5', 4385, 3'-AAAAGA-5', 4387, 3'-AAGAAA-5', 4389, 3'-AGAAAA-5', 4390, 3'-AAAAGA-5', 4392, 3'-AAGAAA-5', 4394, 3'-AGAAAA-5', 4395, 3'-AGGACA-5', 4468, 3'-ACGAGA-5', 4473, 3'-AAGACA-5', 4507, 3'-AACAGA-5', 4518.
 * 16) inverse positive strand in the positive direction (from ZNF497 to A1BG) is SuccessablesHbox3i++.bas, looking for 3'-ANNANA-5', 55, 3'-AGCACA-5', 80, 3'-ACCACA-5', 105, 3'-ACAAGA-5', 108, 3'-AAGAAA-5', 110, 3'-AACATA-5', 114, 3'-ATAAGA-5', 117, 3'-AAGAGA-5', 119, 3'-AAAAGA-5', 137, 3'-AAGAGA-5', 139, 3'-AGGACA-5', 144, 3'-ACAAAA-5', 147, 3'-AGGAGA-5', 221, 3'-AGCAGA-5', 1393, 3'-AGCAGA-5', 1493, 3'-AAGAAA-5', 1981, 3'-AAGAGA-5', 1990, 3'-AGTAGA-5', 2111, 3'-ACGATA-5', 2157, 3'-ATCACA-5', 2170, 3'-AGTATA-5', 2177, 3'-ATAAGA-5', 2180, 3'-AAGACA-5', 2182, 3'-AAAAGA-5', 2276, 3'-AAGAAA-5', 2278, 3'-AGAAAA-5', 2279, 3'-AAAAAA-5', 2281, 3'-ATTAAA-5', 2440, 3'-AAAAAA-5', 2451, 3'-AAAACA-5', 2453, 3'-AAGAAA-5', 2585, 3'-ACCACA-5', 2813, 3'-AAGACA-5', 2925, 3'-ACCAGA-5', 2941, 3'-AAGACA-5', 2957, 3'-AGGAGA-5', 2981, 3'-AACAGA-5', 3004, 3'-AACAGA-5', 3053, 3'-AGAAGA-5', 3058, 3'-ACCAGA-5', 3245, 3'-ACCAGA-5', 3299, 3'-AGGAGA-5', 3304, 3'-AGAAGA-5', 3395, 3'-AAGAAA-5', 3397, 3'-AGGACA-5', 3622, 3'-ACCACA-5', 3950, 3'-ACCACA-5', 3969, 3'-ACCAAA-5', 4108, 3'-AGTAAA-5', 4119, 3'-AAAATA-5', 4122, 3'-ACTAAA-5', 4134, 3'-ATCAAA-5', 4139, 3'-AGTACA-5', 4365, 3'-ACCAGA-5', 4380, 3'-AAGAGA-5', 4386, 3'-ACGACA-5', 4392.

Regarding hypothesis 2
Hypothesis 2: If an H box is present at least one transcription factor uses the H box to affect A1BG transcription.

α-helical H-box motifs
An "α-helical H-box motif and F/YxxF/Y motifs [are] located in the N-terminal domain of DCAF1 WD that are involved in exclusive binding to DDB1."

The H-box motif has 13 amino acids with the following nucleotide sequences: None of which combined in either direction allow ANANNA, ANNANA, TNTNNT or TNNTNT. So this H-box motif apparently does not bind to an H box for DDB1 binding. Further, in the 13 amino acid sequences of the H-boxes for the genes DCAF1, DCAF4, DCAF5, DCAF6, DCAF8, WDTC1, DCAF12, or ASG2 no nucleotide sequences occur for binding to an H box.
 * 1) (A/C/T)(A/C/T)N
 * 2) N(C/T)N
 * 3) N(C/T)N
 * 4) N(A/C/G)N
 * 5) (A/G/T)NN
 * 6) (C/G/T)(A/T)N
 * 7) NNN
 * 8) N(A/C/G)N
 * 9) (A/C/T)(A/C/G)N
 * 10) N(A/C)N
 * 11) N(C/T)N
 * 12) (A/C/T)NN
 * 13) NNN.

cAMP-response elements and Nup98
"Analysis of genes whose expression is altered upon depletion of Nup98 or DHX9 further revealed that >60% of these genes contained a putative cAMP-response element (CRE) (p = 1.16 x 10-21), and CRE-containing genes represent ~50% of the Nup98 interacting gene loci detected in Nup98-Dam-ID studies (p = 5.5 x 10-64).7,14,44,45 Consistent with these observations, Nup98 has been reported to interact with the CREB-binding protein CBP,31 a transcriptional co-activator that is recruited to CRE-containing genes. DHX9 has also been shown to directly interact with CBP to aid in recruitment of RNA polymerase II to CRE-containing genes,46 with the ATPase activity of DHX9 being important for transcriptional induction of genes regulated by a CRE sequence.47"

"Nup98 could stimulate CBP-dependent transcription in a manner consistent with its ability to stimulate the ATPase activity of DHX9 [...] Nup98 functions as a cofactor to regulate the transcriptional functions of DHX9 at a subset of genes [...]."

"In addition to being present at shared gene loci, Nup98 and DHX9 bind mRNA transcripts from these same genes,7 suggesting that a Nup98-DHX9 complex likely contributes to pre-mRNA processing, to which DHX9 has also previously been linked.48,49 Consistent with this, analysis of transcriptome data from cells depleted of DHX9 or Nup98 show 217 shared transcripts with splicing defects.7 Cells depleted of Nup98 or DHX9 also showed common splicing defects of the E1A reporter gene,7 which together suggests that DHX9 and Nup98 are involved in mRNA splicing. Moreover, analysis of several mRNAs that bind to Nup98 and DHX9 showed increased binding to DHX9 upon depletion of Nup98.7"

H box and A1BG
A Google Scholar search using "H box", "transcription factor", and A1BG produced only 2 results neither of which refers to any specific "H box", "transcription factor", for A1BG.

A Google Scholar search using "H box" and A1BG produced only 4 results: "Bioinformatics to Tackle the Biological Meaning of Human Cerebrospinal Fluid Proteome", "Detection of biomarkers with a multiplex quantitative proteomic platform in cerebrospinal fluid of patients with neurodegenerative disorders", "Interactions between Neuronal Innate Immune Pattern Recognition Receptor Pathways and Neurotropic Arboviruses: Host Measures, Viral Countermeasures, and Potential Therapeutics", and "The Influence of Preovulatory Estradiol on Uterine Transcriptomics and Proteomics Around Maternal Recognition of Pregnancy in Beef Cattle", with mentions of A1BG only.

H box and transcription factors present in A1BG
A Google Scholar search using "H box" and "AGC box" produced only 3 results: "Regulatory elements governing pathogenesis-related (PR) gene expression": "… 1991). Two 12-bp direct repeat sequences, which include one AGC box motif, are located at -698 and -637 … (1989) in the parsley phenylalanine ammonia-lyase gene, and to the H-box sequence CCTACC(NhCT described by Loake et al …", "Promoters for sugarcane transformation: isolation of specific sequences and evaluation of rolC": no mention, "Physiological function of rhamnogalacturonan lyase genes based in the analysis of cis-acting elements located in the promoter region": "… The boxes described next, were located in all RGL promoters. The P-box is the target site for petal binding activity and it can drive the specific expression in tobacco petals. Further, a fragment of 103 bp and H-box are involved in the same function …"

A Google Scholar search using "H box" and "ATA box" produced only 1 result: "Genome-wide analysis of chicken snoRNAs provides unique implications for the evolution of vertebrate snoRNAs": "… Box H/ACA snoRNAs exhibit a common hairpin-hinge-hairpin-tail secondary structure with the H box (ANANNA, where N stands for any nucleotide) in the hinge region and the ACA motif three nucleotides from the 3' end of the molecule …"

A Google Scholar search using "H box" and "C box" produced About 1,160 results (0.41 sec): about 60 % deal with box C/D and H/ACA snoRNAs and localization of snoRNP proteins and about 40 % deal with radiation effects in DNA.

A Google Scholar search using "H box" and "D box" produced About 1,570 results (0.33 sec): about 50 % deal with box C/D and H/ACA snoRNAs or (no) RNPs and about 50 % deal with DExH/D-box proteins.

A Google Scholar search using "H box" and "CAREs" produced About 111 results (0.08 sec): about 20 % deal with DEAD/H-box helicases, about 60 % deal with 'cares', and about 20 % deal with CAREs: "Salt and drought stress and ABA responses related to bZIP genes from V. radiata and V. angularis" - "… proteins always bind using a non-palindromic sequence, such as PB (TGAAAA), GLM (GTGAGCAT) or an H-box (CCTACC) (Izawa … Subcellular localization, cis-acting regulatory elements (CAREs) and gene structures were determined using ProtComp Version 9.0 (http://linux1 …", "In-silico analysis of cis-acting regulatory elements of pathogenesis-related proteins of Arabidopsis thaliana and Oryza sativa" - "… Transcription regulation involves association between transcription factors and particular cis-acting regulatory elements (CAREs) of a specific gene involved in plant defense response [18]. CAREs are short regulatory motifs …"

A Google Scholar search using "H box" and "CArG box" produced About 15 results (0.05 sec): about 30 % don't deal with CArG or H box, but about 70 % do: e.g. "Petal-specific activity of the promoter of an anthocyanidin synthase gene of tobacco (Nicotiana tabacum L.)" - "… Binding site name. Sequences. Position. CArG box. ctaattaatg. −489 to −480. G box. caagtg. −643 to −636. cacgtc. −142 to −137. H box. cctacc. −896 to −891. cctacc. −589 to −584. cctacc. −51 to −46. P box. aacctacc. −898 to −891. cccctacc. −591 to −584. aacctacc. −53 to −46 …"

A Google Scholar search using "H box" and "CRE" produced About 839 results (0.09 sec): about 30 % deal with CRE without H box, about 30 % deal with DExD/H box without CRE, about 30 % don't include either, and about 10 % contain both, e.g., "Nucleoplasmic Nup98 controls gene expression by regulating a DExH/D-box protein" - "… KEYWORDS DDX3; DDX5; DDX21; DExD/ H-box helicases; DHX9; FG-Nup; gene expression regulation; nuclear pore complex; nucleoporins; Nup98 … of Nup98 or DHX9 further revealed that >60% of these genes contained a putative cAMP-response element (CRE) (p D 1.16 …". There's "Sequence elements required for activity of a murine major histocompatibility complex class II promoter bind common and cell-type-specific nuclear factors": "… These elements correspond to the conserved sequence elements found in other human and mouse class H genes, the X box, the Y box, and the H box … A common binding factor for the H-box element was detected in extracts from WEHI-3 and L cells …".

A Google Scholar search using "H box" and "CRE box" produced 2 results (0.03 sec): "Transcriptional and translational regulation of gene expression in haploid spermatids" - "… Although the first 8 bp in the Tet-1 11-mer shares homology with the CRE box, Tet-1 was demonstrated to be distinct from known … been identified in testicular extracts: 1. A 18 kD phosphoprotein that binds to the Y-box (Prml: 56–65; Prm2: 116–129) and the H-box (Prml: 70 …" and "Haploid spermatids exhibit translationally repressed mRNAs" - "… However, RNA-binding proteins solely bind to the Y-box, but not to the H-box and the Z-box (Fig … genes for transition proteins and protamines, are transcribed through binding of a cAMP-responsive element modulator (CREM) to a cAMP-responsive element (CRE) box in …".

A Google Scholar search using "H box" and "enhancer box" produced 5 results (0.10 sec): "Control of phenylalanine ammonia-lyase gene promoters from pea by UV radiation" - "… box 1 box 2 box 4 box 5 Fig … Combination of H-box [CCTACC(N7)CT] and G-box (CACGTG) cis elements is necessary for feed-forward stimulation of a chalcone synthase promoter by the phenylpropanoid-pathway intermediate p-coumaric acid", "Hybrids of the bHLH and bZIP protein motifs display different DNA-binding activities in vivo vs. in vitro" - "… Proteins containing the bHLH domain, in the presence or absence of additional dimerization elements including leucine zipper (LZ) or PAS domain, can target the Enhancer box (E-box, CACGTG), thereby regulating cellular metabolism, differentiation, and development [11], [12 …", "Transcriptional Regulatory Mechanisms Driving the Human Antiviral Response" - "… factor p300 E1A binding protein p300 dUTP Deoxyuridine-triphOSphataSe PBS Phosphate buffered saline E-bO) Enhancer box PCR Polymerase … Both members contain two N-terminal caspase recruitment domains (CARDs) and a C-terminal DExD/H box RNA helicase domain …", and "MicroRNA 及其靶基因的时空特异性与动态变化" - "… 胞生长与凋亡调控中至关重要．研究表明c-Myc 同样能够调控miRNA 的转录．c-Myc 能够结合 miR-17-92 簇启动子区域的E-boxes(enhancer box sequences) 激活相应 … Cell, 2009, 136(1): 75-84 [72] Fuller-Pace F V. DExD/H box RNA helicases: multifunctional proteins with important …", with the fifth not applicable.

A Google Scholar search using "H box" and "BREu" produced 12 results (0.11 sec): about 75 % (9) do not contain either, 17 % (2) have BREu only, 8 % (1) have H-box only.

A Google Scholar search using "H box" and "GC box" produced About 51 results (0.11 sec): about 60 % with neither, 20 % with only GC box, and 20 % with both.

A Google Scholar search using "H box" and HNF produced About 68 results (0.08 sec): about 40 % deal with HNF only, about 30 % deal with H box or H-box, about 20 % deal with neither, and about 10 % deal with both, e.g., "Hormonal regulation of an islet-specific enhancer in the pancreatic homeobox gene STF-1." - "… Constructs containing deletions past the E- and H-box motifs were far less active, demonstrating the importance of these sites in … By contrast, mutagenesis of the HNF-3 binding site (J5917; TAAAT/TCCCT) significantly blocked inhibition by dexamethasone, indicating that this …".

A Google Scholar search using "H box" and "HY box" produced 2 results (0.08 sec): neither appear to apply to either.

A Google Scholar search using "H box" and MRE produced About 152 results (0.09 sec): about 70 % appear to apply to MYB recognition elements (MREs), not Metal responsive elements (MREs), the other 30 % do not appear to apply to either.

A Google Scholar search using "H box" and "pyrimidine box" produced 7 results (0.09 sec): about 3 pertain to neither, and the other four appear to connect the H-box, but not the H box with the pyrimidine box.

A Google Scholar search using "H box" and STAT produced About 2,310 results (0.13 sec): about all appear to connect the JAK/STAT pathway with DExD/H-box RNA helicases.

A Google Scholar search using "H box" and "TATA box" outside the core promoter produced About 727 results (0.16 sec): about 50 % appear to connect the TATA box in the core promoter with the H-box (CCTACC) (5), or appear to mention the DExD/H box only (2), and about 30 % mention the TATA box only.

A Google Scholar search using "H box" and "W box" produced About 513 results (0.39 sec): about 50 % deal with both and the other 50 % deal with neither.

Verifications
To verify that your sampling has explored something, you may need a control group. Perhaps where, when, or without your entity, source, or object may serve.

Another verifier is reproducibility. Can you replicate something about your entity in your laboratory more than 3 times. Five times is usually a beginning number to provide statistics (data) about it.

For an apparent one time or perception event, document or record as much information coincident as possible. Was there a butterfly nearby?

Has anyone else perceived the entity and recorded something about it?

Gene ID: 1, includes the nucleotides between neighboring genes and A1BG. These nucleotides can be loaded into files from either gene toward A1BG, and from template and coding strands. These nucleotide sequences can be found in Gene transcriptions/A1BG. Copying the above discovered transcription factors and putting the sequences in "⌘F" locates these sequences in the same nucleotide positions as found by the computer programs.

"In humans, telomerase is composed of a reverse transcriptase (hTERT), which uses the RNA component (hTERC) to dock onto the 3′ single-stranded telomere end. hTERT may then processively synthesise telomeric repeats from the template provided by hTERC, before dissociating7–9. All telomerase RNAs possess a 3′ end element necessary for its stability10. In hTERC, this is two stem-loop structures separated by an H-box (ANANNA) and ACA motif (H/ACA). The binding of telomerase factors dyskerin, NOP10, and NHP2 at the H/ACA motif form the so-called ‘pre-ribonucleoprotein complex’, before GAR1 binds in transition to the mature RNP11,12. hTERC then binds to chaperone TCAB1, which assists its trafficking to the Cajal bodies where the functional telomerase complex localises13. Recruitment to the telomeres in S-phase is mediated by the protective complex shelterin14,15. Correct assembly of the telomerase complex, with appropriate co-factors for maturation, stability, and subcellular localisation, is necessary for its function and thus telomere maintenance."

Core promoter H boxes
From the first nucleotide just after ZSCAN22 to the first nucleotide just before A1BG are 4460 nucleotides. The core promoter on this side of A1BG extends from approximately 4425 to the possible transcription start site at nucleotide number 4460.

There are no H boxes (3'-ACACCA-5') in the core promoter between ZSCAN22 and A1BG.

There are no H boxes (3'-AGAGGA-5') in the core promoter between ZSCAN22 and A1BG. But, there is one inverse and its complement 3'-AGGAGA-5' at 4428.

There are no H boxes (3'-ANANNA-5') in the core promoter between ZSCAN22 and A1BG. But, there is one inverse and its complement 3'-AGGAGA-5' at 4428.

From the first nucleotide just after ZNF497 to the first nucleotide just before A1BG are 858 nucleotides. The core promoter on this side of A1BG extends from approximately 824 to the possible transcription start site at nucleotide number 858. Nucleotides (nts) have been added from ZNF497 to A1BG. The TSS for A1BG is now at 4300 nts from just on the other side of ZNF497. The core promoter should now be from 4266 to 4300.

There are no H boxes (3'-ACACCA-5') in the core promoter between ZNF497 and A1BG.

There are no H boxes (3'-AGAGGA-5') in the core promoter between ZNF497 and A1BG.

There are no H boxes (3'-ANANNA-5') in the core promoter between ZNF497 and A1BG. But, there is an inverse and its complement 3'-AGGACA-5' at 4252. And, there is one after the TSS 3'-AGAGAA-5' at 4387, plus 3'-AGTACA-5' at 4365, 3'-ACCAGA-5' at 4380, 3'-AAGAGA-5' at 4386, 3'-ACGACA-5' at 4392 and their complements after the TSS.

Proximal promoter H boxes
The proximal promoter begins about nucleotide number 4210 in the negative direction.

There are no H boxes (3'-ACACCA-5') in the proximal promoter between ZSCAN22 and A1BG.

There are no H boxes (3'-AGAGGA-5') in the proximal promoter between ZSCAN22 and A1BG.

There is one H box (3'-ANANNA-5'): negative direction, negative strand, 3'-ACACGA-5' at 4402 in the proximal promoter between ZSCAN22 and A1BG. But, on the positive strand in the negative direction there are 16: 3'-ACAAAA-5' at 4216, 3'-AAAAAA-5' at 4218, 3'-AAAATA-5' at 4220, 3'-AAATAA-5' at 4221, 3'-ATAATA-5' at 4223, 3'-AAAAAA-5' at 4378, 3'-AAAAGA-5' at 4380, 3'-AAAGAA-5' at 4381, 3'-AGAAAA-5' at 4383, 3'-AAAAAA-5'at 4385, 3'-AAAAGA-5' at 4387, 3'-AAAGAA-5' at 4388, 3'-AGAAAA-5' at 4390, 3'-AAAAGA-5' at 4392, 3'-AAAGAA-5' at 4393, and 3'-AGAAAA-5' at 4395, with their complements on the negative strand, negative direction.

The proximal promoter begins about nucleotide number 4195 in the positive direction.

There are no H boxes (3'-ACACCA-5') in the proximal promoter between ZNF497 and A1BG.

There are no H boxes (3'-AGAGGA-5') in the proximal promoter between ZNF497 and A1BG.

There is one H box (3'-ANANNA-5'): 3'-AGAGAA-5' at 4387 in the proximal promoter, negative strand, positive direction, between ZNF497 and A1BG. But, there are four: 3'-TCATGT-5' at 4365, 3'-TGGTCT-5' at 4380, 3'-TTCTCT-5' at 4386, and 3'-TGCTGT-5' at 4392 and their complements in the positive direction between ZNF497 and A1BG.

In the positive direction on the positive strand their is an inverse: 3'-AGGACA-5' at 4252, and its complement.

Distal promoter H boxes
Using an estimate of 2 knts, a distal promoter to A1BG would be expected after nucleotide number 2460 in the negative direction.

There are three H boxes after nucleotide number 2460 in the negative strand and negative direction: 3'-ACACCA-5' at 2659, 3'-ACACCA-5' at 3187, and 3'-ACACCA-5' at 3811.

There is one H box inverse complement, negative strand, negative direction 3'-TGGTGT-5' (3'-ACCACA-5') at 3764.

There are three H boxes in the distal promoter on the positive strand in the negative direction: 3'-AGAGGA-5' at 3387, 3'-AGAGGA-5' at 3638, and 3'-AGAGGA-5' at 3675.

There is one inverse H box and its complement 3'-AGGAGA-5' at 3790.

Regarding 3'-ANANNA-5', on the negative side, negative direction, there are 13 H boxes: 3'-ACATCA-5' at 2541, 3'-ACACCA-5' at 2659, 3'-ACATTA-5' at 2675, 3'-ATAAAA-5' at 2853, 3'-AAAGTA-5' at 2886, 3'-ACATTA-5' at 3064, 3'-AGATGA-5' at 3159, 3'-ACACCA-5' at 3187, 3'-AGAAGA-5' at 3554, 3'-AGACGA-5' at 3707, 3'-ACACCA-5' at 3811, 3'-ACATTA-5' at 3973, and 3'-ACATCA-5' at 4124.

On the positive strand, negative direction, there are 122 H boxes: 3'-AAAAAA-5' at 2461, 3'-AAAAAA-5' at 2462, 3'-AAAAAA-5' at 2463, 3'-AAAAAA-5' at 2464, 3'-AAAAAA-5' at 2465, 3'-AAAAAA-5' at 2466, 3'-AAAAAA-5' at 2467, 3'-AAAAAA-5' at 2468, 3'-AAAAAA-5' at 2469, 3'-AAAAAA-5' at 2470, 3'-AAAGCA-5' at 2473, 3'-AAAGCA-5' at 2479, 3'-AAACAA-5' at 2484, 3'-AAACAA-5' at 2488, 3'-ACAAAA-5' at 2490, 3'-ATAGTA-5' at 2500, 3'-AGAAAA-5' at 2506, 3'-AAAACA-5' at 2508, 3'-AAACAA-5' at 2509, 3'-AGACCA-5' at 2599, 3'-ATACAA-5' at 2642, 3'-ACAAAA-5' at 2644, 3'-AAATCA-5' at 2648, 3'-ACAGGA-5' at 2690, 3'-AAATCA-5' at 2749, 3'-AGAGCA-5' at 2781, 3'-AAAAGA-5' at 2798, 3'-AAAGAA-5' at 2799, 3'-AAAGAA-5' at 2803, 3'-AGAAAA-5' at 2805, 3'-AAAAGA-5' at 2807, 3'-AGAGAA-5' at 2810, 3'-AGAAGA-5' at 2812, 3'-AGAAAA-5' at 2815, 3'-AAAAAA-5' at 2817, 3'-AAAAGA-5' at 2819, 3'-AAAGAA-5' at 2820, 3'-AGAAAA-5' at 2822, 3'-AAAAGA-5' at 2824, 3'-AGAGAA-5' at 2827, 3'-AGAAGA-5' at 2829, 3'-AGAAAA-5' at 2832, 3'-AAAAAA-5' at 2834, 3'-AAAAGA-5' at 2836, 3'-AAAGAA-5' at 2837, 3'-AGAAAA-5' at 2839, 3'-AAAACA-5' at 2841, 3'-AAACAA-5' at 2842, 3'-AAAATA-5' at 2868, 3'-ATATAA-5' at 2873, 3'-AAAAAA-5' at 2929, 3'-ACATCA-5' at 2941, 3'-ACATTA-5' at 2951, 3'-AAACCA-5' at 2971, 3'-AAAATA-5' at 3012, 3'-AAATAA-5' at 3013, 3'-AAAAAA-5' at 3026, 3'-AAACTA-5' at 3029, 3'-AGACCA-5' at 3122, 3'-AAAACA-5' at 3166, 3'-ACATAA-5' at 3169, 3'-ATAAAA-5' at 3171, 3'-AAATTA-5' at 3175, 3'-AGATCA-5' at 3277, 3'-ACAAGA-5' at 3307, 3'-AGAGCA-5' at 3310, 3'-AAAACA-5' at 3329, 3'-AAACAA-5' at 3330, 3'-AAATAA-5' at 3334, 3'-AAACAA-5' at 3338, 3'-ACAAGA-5' at 3340, 3'-AGAAAA-5' at 3343, 3'-AAACCA-5' at 3365, 3'-AGAGGA-5' at 3387, 3'-ACATCA-5' at 3394, 3'-AGAGAA-5' at 3406, 3'-ACATCA-5' at 3415, 3'-ACATTA-5' at 3436, 3'-ATATTA-5' at 3454, 3'-ATATTA-5' at 3468, 3'-AAACCA-5' at 3484, 3'-AGATCA-5' at 3489, 3'-AAAACA-5' at 3511, 3'-ACACAA-5' at 3514, 3'-ATAATA-5' at 3538, 3'-ACAAGA-5' at 3635, 3'-AGAGGA-5' at 3638, 3'-AAAGAA-5' at 3666, 3'-AGAACA-5' at 3668, 3'-AGAGGA-5' at 3675, 3'-ACAAGA-5' at 3759, 3'-AGACCA-5' at 3762, 3'-ACAAAA-5' at 3767, 3'-AGAGCA-5' at 3913, 3'-AGATGA-5' at 3920, 3'-AGACCA-5' at 4031, 3'-ACAAAA-5' at 4066, 3'-AAAAAA-5' at 4068, 3'-AAAATA-5' at 4070, 3'-AAATAA-5' at 4071, 3'-AAATAA-5' at 4075, 3'-ATAATA-5' at 4077, 3'-ATAGAA-5' at 4080, 3'-AAAGAA-5' at 4084, 3'-AGAAAA-5' at 4086, 3'-AGACAA-5' at 4182, 3'-ACAAAA-5' at 4216, 3'-AAAAAA-5' at 4218, 3'-AAAATA-5' at 4220, 3'-AAATAA-5' at 4221, 3'-ATAATA-5' at 4223, 3'-AAAAAA-5' at 4378, 3'-AAAAGA-5' at 4380, 3'-AAAGAA-5' at 4381, 3'-AGAAAA-5' at 4383, 3'-AAAAAA-5' at 4385, 3'-AAAAGA-5' at 4387, 3'-AAAGAA-5' at 4388, 3'-AGAAAA-5' at 4390, 3'-AAAAGA-5' at 4392, 3'-AAAGAA-5' at 4393, and 3'-AGAAAA-5' at 4395.

Using an estimate of 2 knts, a distal promoter to A1BG would be expected after nucleotide number 2300 in the positive direction.

There are two H boxes after nucleotide number 2300 in the negative strand and positive direction: 3'-ACACCA-5' at 2603 and 3'-ACACCA-5' at 3825.

There are two H boxes after nucleotide number 2300 in the positive strand and positive direction: 3'-ACACCA-5' at 3643 and 3'-ACACCA-5' at 3967.

Regarding 3'-ANANNA-5', on the negative strand, positive direction, there are 25 H boxes: 3'-ATACCA-5' at 2591, 3'-ACACCA-5' at 2603, 3'-ATAGAA-5' at 2628, 3'-AAACCA-5' at 2632, 3'-ACACTA-5'at 2637, 3'-ATATAA-5' at 2662, 3'-AGAGCA-5' at 2704, 3'-AGAGGA-5' at 2793, 3'-AAAGGA-5' at 2829, 3'-ACAGAA-5' at 2838, 3'-AAAGAA-5' at 3066, 3'-AGAACA-5' at 3094, 3'-AGAGCA-5' at 3138, 3'-ACAGCA-5' at 3212, 3'-ACAGTA-5' at 3414, 3'-AGATGA-5' at 3476, 3'-ACAGGA-5' at 3572, 3'-AAAGCA-5' at 3599, 3'-ACATGA-5' at 3708, 3'-ACACCA-5' at 3825, 3'-AAAAGA-5' at 3929, 3'-AGAACA-5' at 4068, 3'-AAATGA-5' at 4094, 3'-ACATCA-5' at 4116, and 3'-ACATGA-5' at 4154.

On the positive strand, positive direction there are 20 H boxes: 3'-AAATAA-5' at 2347, 3'-AAAAAA-5' at 2451, 3'-AAAACA-5' at 2453, 3'-AGACGA-5' at 2976, 3'-AGACCA-5' at 3022, 3'-AGAGAA-5' at 3056, 3'-AGAAGA-5' at 3058, 3'-AGAGGA-5' at 3302, 3'-AGACGA-5' at 3307, 3'-ACAGAA-5' at 3393, 3'-AGAAGA-5' at 3395, 3'-ACAGGA-5' at 3620, 3'-ACACCA-5' at 3643, 3'-AAACCA-5' at 3948, 3'-ACACCA-5' at 3967, 3'-AGAGGA-5' at 4059, 3'-AAAATA-5' at 4122, 3'-AAATCA-5' at 4137, 3'-AAATAA-5' at 4142, and 3'-ATATTA-5' at 4168.

There inverses on the negative strand in the positive direction of 31 H boxes: 3'-ATGACA-5' at 2412, 3'-ACTACA-5' at 2428, 3'-AGGACA-5' at 2460, 3'-ATTATA-5' at 2548, 3'-ACCACA-5' at 2600, 3'-AGGAAA-5' at 2623, 3'-AATAGA-5' at 2627, 3'-ACCACA-5' at 2634, 3'-AACAGA-5' at 2652, 3'-AGCAAA-5' at 2706, 3'-AGGAAA-5' at 2831, 3'-AACACA-5' at 2835, 3'-ATGACA-5' at 2843, 3'-AGAACA-5' at 3094, 3'-AACACA-5' at 3096, 3'-AGGACA-5' at 3131, 3'-ACCAAA-5' at 3175, 3'-AACAGA-5' at 3179, 3'-AGCAGA-5' at 3214, 3'-AGTAGA-5' at 3416, 3'-AATAAA-5' at 3427, 3'-ACCAGA-5' at 3548, 3'-ATGACA-5' at 3569, 3'-AGGAGA-5' at 3650, 3'-AGCACA-5' at 3740, 3'-ACCACA-5' at 3859, 3'-AAAAGA-5' at 3929, 3'-AGAACA-5' at 4068, 3'-ATCATA-5' at 4149, and 3'-ATTATA-5' at 4166.

Transcribed H boxes
Gene ID: 1653 DDX1 DEAD-box helicase 1: "DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein of unknown function. It shows high transcription levels in 2 retinoblastoma cell lines and in tissues of neuroectodermal origin."

"Names
 * DEAD (Asp-Glu-Ala-Asp) box helicase 1
 * DEAD (Asp-Glu-Ala-Asp) box polypeptide 1
 * DEAD box polypeptide 1
 * DEAD box protein 1
 * DEAD box protein retinoblastoma
 * DEAD box-1
 * DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 1
 * DEAD/H-box helicase 1"

Gene ID: 1654 DDX3X DEAD-box helicase 3 X-linked: "The protein encoded by this gene is a member of the large DEAD-box protein family, that is defined by the presence of the conserved Asp-Glu-Ala-Asp (DEAD) motif, and has ATP-dependent RNA helicase activity. This protein has been reported to display a high level of RNA-independent ATPase activity, and unlike most DEAD-box helicases, the ATPase activity is thought to be stimulated by both RNA and DNA. This protein has multiple conserved domains and is thought to play roles in both the nucleus and cytoplasm. Nuclear roles include transcriptional regulation, mRNP assembly, pre-mRNA splicing, and mRNA export. In the cytoplasm, this protein is thought to be involved in translation, cellular signaling, and viral replication. Misregulation of this gene has been implicated in tumorigenesis. This gene has a paralog located in the nonrecombining region of the Y chromosome. Pseudogenes sharing similarity to both this gene and the DDX3Y paralog are found on chromosome 4 and the X chromosome. Alternative splicing results in multiple transcript variants."

"Names
 * DEAD (Asp-Glu-Ala-Asp) box helicase 3, X-linked
 * DEAD (Asp-Glu-Ala-Asp) box polypeptide 3, X-linked
 * DEAD box protein 3, X-chromosomal
 * DEAD box, X isoform
 * DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 3
 * DEAD/H box-3
 * helicase-like protein 2"

Gene ID: 1655 DDX5 DEAD-box helicase 5: "This gene encodes a member of the DEAD box family of RNA helicases that are involved in a variety of cellular processes as a result of its role as an adaptor molecule, promoting interactions with a large number of other factors. This protein is involved in pathways that include the alteration of RNA structures, plays a role as a coregulator of transcription, a regulator of splicing, and in the processing of small noncoding RNAs. Members of this family contain nine conserved motifs, including the conserved Asp-Glu-Ala-Asp (DEAD) motif, important to ATP binding and hydrolysis as well as RNA binding and unwinding activities. Dysregulation of this gene may play a role in cancer development. Alternative splicing results in multiple transcript variants." No H-box is directly associated, but interaction does occur with CREBBP (CREB-binding protein).

Gene ID: 1656 DDX6 DEAD-box helicase 6: "This gene encodes a member of the DEAD box protein family. The protein is an RNA helicase found in P-bodies and stress granules, and functions in translation suppression and mRNA degradation. It is required for microRNA-induced gene silencing. Multiple alternatively spliced variants, encoding the same protein, have been identified."

XP_011540946.1 probable ATP-dependent RNA helicase DDX6 isoform X2: "Helicase superfamily c-terminal domain; associated with DEXDc-, DEAD-, and DEAH-box proteins, yeast initiation factor 4A, Ski2p, and Hepatitis C virus NS3 helicases; this domain is found in a wide variety of helicases and helicase related proteins; may not be an autonomously folding unit, but an integral part of the helicase; 4 helicase superfamilies at present according to the organization of their signature motifs; all helicases share the ability to unwind nucleic acid duplexes with a distinct directional polarity; they utilize the free energy from nucleoside triphosphate hydrolysis to fuel their translocation along DNA, unwinding the duplex in the process".

XP_016872740.1 probable ATP-dependent RNA helicase DDX6 isoform X2: "Helicase superfamily c-terminal domain; associated with DEXDc-, DEAD-, and DEAH-box proteins, yeast initiation factor 4A, Ski2p, and Hepatitis C virus NS3 helicases; this domain is found in a wide variety of helicases and helicase related proteins; may not be an autonomously folding unit, but an integral part of the helicase; 4 helicase superfamilies at present according to the organization of their signature motifs; all helicases share the ability to unwind nucleic acid duplexes with a distinct directional polarity; they utilize the free energy from nucleoside triphosphate hydrolysis to fuel their translocation along DNA, unwinding the duplex in the process".

Gene ID: 1659 DHX8 DEAH-box helicase 8: "This gene is a member of the DEAH box polypeptide family. The encoded protein contains the DEAH (Asp-Glu-Ala-His) motif which is characteristic of all DEAH box proteins, and is thought to function as an ATP-dependent RNA helicase that regulates the release of spliced mRNAs from spliceosomes prior to their export from the nucleus. This protein may be required for the replication of human immunodeficiency virus type 1 (HIV-1). Alternative splicing results in multiple transcript variants."

Gene ID: 1660 DHX9 DExH-box helicase 9: "This gene encodes a member of the DEAH-containing family of RNA helicases. The encoded protein is an enzyme that catalyzes the ATP-dependent unwinding of double-stranded RNA and DNA-RNA complexes. This protein localizes to both the nucleus and the cytoplasm and functions as a transcriptional regulator. This protein may also be involved in the expression and nuclear export of retroviral RNAs. Alternate splicing results in multiple transcript variants. Pseudogenes of this gene are found on chromosomes 11 and 13."

Gene ID: 1662 DDX10 DEAD-box helicase 10: "DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, and it may be involved in ribosome assembly. Fusion of this gene and the nucleoporin gene, NUP98, by inversion 11 (p15q22) chromosome translocation is found in the patients with de novo or therapy-related myeloid malignancies."

NP_004389.2 probable ATP-dependent RNA helicase DDX10: "Helicase superfamily c-terminal domain; associated with DEXDc-, DEAD-, and DEAH-box proteins, yeast initiation factor 4A, Ski2p, and Hepatitis C virus NS3 helicases; this domain is found in a wide variety of helicases and helicase related proteins; may not be an autonomously folding unit, but an integral part of the helicase; 4 helicase superfamilies at present according to the organization of their signature motifs; all helicases share the ability to unwind nucleic acid duplexes with a distinct directional polarity; they utilize the free energy from nucleoside triphosphate hydrolysis to fuel their translocation along DNA, unwinding the duplex in the process".

Gene ID: 1663 DDX11 DEAD/H-box helicase 11: "DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which is an enzyme that possesses both ATPase and DNA helicase activities. This gene is a homolog of the yeast CHL1 gene, and may function to maintain chromosome transmission fidelity and genome stability. Alternative splicing results in multiple transcript variants encoding distinct isoforms."

Gene ID: 1665 DHX15 DEAH-box helicase 15: "The protein encoded by this gene is a putative ATP-dependent RNA helicase implicated in pre-mRNA splicing."

"Names
 * ATP-dependent RNA helicase #46
 * DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 15
 * DEAD/H box-15
 * DEAH (Asp-Glu-Ala-His) box helicase 15
 * DEAH (Asp-Glu-Ala-His) box polypeptide 15
 * DEAH box protein 15
 * RNA helicase 2
 * putative pre-mRNA-splicing factor ATP-dependent RNA helicase DHX15"

Gene ID: 2117 ETV3 ETS variant transcription factor 3: "DEAD/H-box RNA helicase binding"

Gene ID: 4086 SMAD1 SMAD family member 1: "The protein encoded by this gene belongs to the SMAD, a family of proteins similar to the gene products of the Drosophila gene 'mothers against decapentaplegic' (Mad) and the C. elegans gene Sma. SMAD proteins are signal transducers and transcriptional modulators that mediate multiple signaling pathways. This protein mediates the signals of the bone morphogenetic proteins (BMPs), which are involved in a range of biological activities including cell growth, apoptosis, morphogenesis, development and immune responses. In response to BMP ligands, this protein can be phosphorylated and activated by the BMP receptor kinase. The phosphorylated form of this protein forms a complex with SMAD4, which is important for its function in the transcription regulation. This protein is a target for SMAD-specific E3 ubiquitin ligases, such as SMURF1 and SMURF2, and undergoes ubiquitination and proteasome-mediated degradation. Alternatively spliced transcript variants encoding the same protein have been observed."

"DEAD/H-box RNA helicase binding"

Gene ID: 4088 SMAD3 SMAD family member 3: "The protein encoded by this gene belongs to the SMAD, a family of proteins similar to the gene products of the Drosophila gene 'mothers against decapentaplegic' (Mad) and the C. elegans gene Sma. SMAD proteins are signal transducers and transcriptional modulators that mediate multiple signaling pathways. This protein functions as a transcriptional modulator activated by transforming growth factor-beta and is thought to play a role in the regulation of carcinogenesis."

"DEAD/H-box RNA helicase binding"

Gene ID: 4090 SMAD5 SMAD family member 5: "The protein encoded by this gene is involved in the transforming growth factor beta signaling pathway that results in an inhibition of the proliferation of hematopoietic progenitor cells. The encoded protein is activated by bone morphogenetic proteins type 1 receptor kinase, and may be involved in cancer. Alternative splicing results in multiple transcript variants."

"DEAD/H-box RNA helicase binding"

Gene ID: 4343 MOV10 Mov10 RISC complex RNA helicase:
 * 1) NP_001123551.1  helicase MOV-10 isoform 1 cd18038 Location:498 → 733 DEXXQc_Helz-like; DEXXQ/H-box helicase domain of Helz-like helicase.
 * 2) NP_001308253.1  helicase MOV-10 isoform 1 cd18038 Location:498 → 733 DEXXQc_Helz-like; DEXXQ/H-box helicase domain of Helz-like helicase.
 * 3) NP_001356436.1  helicase MOV-10 isoform 1 cd18038 Location:498 → 733 DEXXQc_Helz-like; DEXXQ/H-box helicase domain of Helz-like helicase.
 * 4) NP_066014.1  helicase MOV-10 isoform 1 cd18038 Location:498 → 733 DEXXQc_Helz-like; DEXXQ/H-box helicase domain of Helz-like helicase.

Gene ID: 8554 PIAS1 protein inhibitor of activated STAT 1: "This gene encodes a member of the protein inhibitor of activated STAT (PIAS) family. PIAS proteins function as SUMO E3 ligases and play important roles in many cellular processes by mediating the sumoylation of target proteins. This protein plays a central role as a transcriptional coregulator of numerous cellular pathways includign the STAT1 and nuclear factor kappaB pathways. Alternate splicing results in multiple transcript variants."

"Names
 * AR interacting protein
 * DEAD/H (Asp-Glu-Ala-Asp/His) box binding protein 1
 * DEAD/H box-binding protein 1
 * E3 SUMO-protein transferase PIAS1
 * RNA helicase II-binding protein
 * gu-binding protein
 * protein inhibitor of activated STAT protein 1
 * zinc finger, MIZ-type containing 3"

Gene ID: 9188 DDX21 DExD-box helicase 21: "DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which is an antigen recognized by autoimmune antibodies from a patient with watermelon stomach disease. This protein unwinds double-stranded RNA, folds single-stranded RNA, and may play important roles in ribosomal RNA biogenesis, RNA editing, RNA transport, and general transcription."

"NP_001243839.1 nucleolar RNA helicase 2 isoform 2" "Helicase superfamily c-terminal domain; associated with DEXDc-, DEAD-, and DEAH-box proteins, yeast initiation factor 4A, Ski2p, and Hepatitis C virus NS3 helicases; this domain is found in a wide variety of helicases and helicase related proteins; may not be an autonomously folding unit, but an integral part of the helicase; 4 helicase superfamilies at present according to the organization of their signature motifs; all helicases share the ability to unwind nucleic acid duplexes with a distinct directional polarity; they utilize the free energy from nucleoside triphosphate hydrolysis to fuel their translocation along DNA, unwinding the duplex in the process".

Gene ID: 9785 DHX38 DEAH-box helicase 38: "DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. The protein encoded by this gene is a member of the DEAD/H box family of splicing factors. This protein resembles yeast Prp16 more closely than other DEAD/H family members. It is an ATPase and essential for the catalytic step II in pre-mRNA splicing process."

Gene ID: 10521 DDX17 DEAD-box helicase 17: "DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure, such as translation initiation, nuclear and mitochondrial splicing, and ribosome and splicesosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which is an ATPase activated by a variety of RNA species, but not by dsDNA. This protein, and that encoded by DDX5 gene, are more closely related to each other than to any other member of the DEAD box family. This gene can encode multiple isoforms due to both alternative splicing and the use of alternative translation initiation codons, including a non-AUG (CUG) start codon."

"Helicase superfamily c-terminal domain; associated with DEXDc-, DEAD-, and DEAH-box proteins, yeast initiation factor 4A, Ski2p, and Hepatitis C virus NS3 helicases; this domain is found in a wide variety of helicases and helicase related proteins; may not be an autonomously folding unit, but an integral part of the helicase; 4 helicase superfamilies at present according to the organization of their signature motifs; all helicases share the ability to unwind nucleic acid duplexes with a distinct directional polarity; they utilize the free energy from nucleoside triphosphate hydrolysis to fuel their translocation along DNA, unwinding the duplex in the process".

Gene ID: 11269 DDX19B DEAD-box helicase 19B: "DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which exhibits RNA-dependent ATPase and ATP-dependent RNA-unwinding activities. This protein is recruited to the cytoplasmic fibrils of the nuclear pore complex, where it participates in the export of mRNA from the nucleus. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene."

Gene ID: 23586 DDX58 DExD/H-box helicase 58: "DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases which are implicated in a number of cellular processes involving RNA binding and alteration of RNA secondary structure. This gene encodes a protein containing RNA helicase-DEAD box protein motifs and a caspase recruitment domain (CARD). It is involved in viral double-stranded (ds) RNA recognition and the regulation of immune response."

"Annotation information: Note: RARRES3 (Gene ID: 5920) and DDX58 (Gene ID: 23586) share the RIG1/RIG-1 alias in common. RIG1 is a widely used alternative name for DExD/H-box helicase 58 (DDX58), which can be confused with the retinoic acid receptor responder 3 (RARRES3) gene, since they share the same alias."

Gene ID: 25913 POT1 protection of telomeres 1: "This gene is a member of the telombin family and encodes a nuclear protein involved in telomere maintenance. Specifically, this protein functions as a member of a multi-protein complex that binds to the TTAGGG repeats of telomeres, regulating telomere length and protecting chromosome ends from illegitimate recombination, catastrophic chromosome instability, and abnormal chromosome segregation. Increased transcriptional expression of this gene is associated with stomach carcinogenesis and its progression. Alternatively spliced transcript variants have been described."

"DEAD/H-box RNA helicase binding"

Gene ID: 26993 AKAP8L A-kinase anchoring protein 8 like: "DEAD/H-box RNA helicase binding"

"DEAD/H-box RNA helicase binding"

Gene ID: 29102 DROSHA drosha ribonuclease III: "This gene encodes a ribonuclease (RNase) III double-stranded RNA-specific ribonuclease and subunit of the microprocessor protein complex, which catalyzes the initial processing step of microRNA (miRNA) synthesis. The encoded protein cleaves the stem loop structure from the primary microRNA (pri-miRNA) in the nucleus, yielding the precursor miRNA (pre-miRNA), which is then exported to the cytoplasm for further processing. In a human cell line lacking a functional copy of this gene, canonical miRNA synthesis is reduced. Somatic mutations in this gene have been observed in human patients with kidney cancer."

Gene ID: 55601 DDX60 DExD/H-box helicase 60: "DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases which are implicated in a number of cellular procsses involving RNA binding and alteration of RNA secondary structure. This gene encodes a DEXD/H box RNA helicase that functions as an antiviral factor and promotes RIG-I-like receptor-mediated signaling."

Gene ID: 55760 DHX32 DEAH-box helicase 32 (putative): "DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this DEAD box protein family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a member of this family. The function of this member has not been determined. Alternative splicing of this gene generates 2 transcript variants, but the full length nature of one of the variants has not been defined."

"Names
 * DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 32
 * DEAD/H box 32
 * DEAD/H helicase-like protein-1
 * DEAH (Asp-Glu-Ala-His) box polypeptide 32
 * DEAH box protein 32
 * huDDX32"

Gene ID: 56916 SMARCAD1 SWI/SNF-related, matrix-associated actin-dependent regulator of chromatin, subfamily a, containing DEAD/H box 1: "This gene encodes a member of the SNF subfamily of helicase proteins. The encoded protein plays a critical role in the restoration of heterochromatin organization and propagation of epigenetic patterns following DNA replication by mediating histone H3/H4 deacetylation. Mutations in this gene are associated with adermatoglyphia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene."

Gene ID: 79132 DHX58 DExH-box helicase 58.

Gene ID: 91351 DDX60L DExD/H-box 60 like: "This gene encodes a member of the DExD/H-box helicase family of proteins, a subset of the super family 2 helicases. Members of the DExD/H-box helicase family share a conserved functional core comprised of two RecA-like globular domains. These domains contain conserved motifs that mediate ATP binding, ATP hydrolysis, nucleic acid binding, and RNA unwinding. In addition to functions in RNA metabolism, members of this family are involved in anti-viral immunity and act as cytosolic sensors of viral nucleic acids. The protein encoded by this gene has been shown to inhibit hepatitis C virus replication in response to interferon stimulation in cell culture. Alternative splicing results in multiple transcript variants."

Gene ID: 121340 SP7 Sp7 transcription factor: "This gene encodes a member of the Sp subfamily of Sp/XKLF transcription factors. Sp family proteins are sequence-specific DNA-binding proteins characterized by an amino-terminal trans-activation domain and three carboxy-terminal zinc finger motifs. This protein is a bone specific transcription factor and is required for osteoblast differentiation and bone formation."

"DEAD/H-box RNA helicase binding"

Gene ID: 170506 DHX36 DEAH-box helicase 36: "This gene is a member of the DEAH-box family of RNA-dependent NTPases which are named after the conserved amino acid sequence Asp-Glu-Ala-His in motif II. The protein encoded by this gene has been shown to enhance the deadenylation and decay of mRNAs with 3'-UTR AU-rich elements (ARE-mRNA). The protein has also been shown to resolve into single strands the highly stable tetramolecular DNA configuration (G4) that can form spontaneously in guanine-rich regions of DNA. Alternative splicing results in multiple transcript variants encoding different isoforms."

"DEx/H-box helicases activate type I IFN and inflammatory cytokines production, organism-specific biosystem"

Gene ID: 339231 ARL16 ADP ribosylation factor like GTPase 16: "The protein encoded by this gene belongs to the ARL (ADP-ribosylation factor-like) family of proteins, which are structurally related to ADP-ribosylation factors (ARFs). This protein has been shown to have an inhibitory role in the cellular antiviral response. This gene product interacts with the C-terminal domain of the DEXD/H-box helicase 58 (DDX58) gene product. This interaction was found to suppress the association between the DDX58 gene product and RNA, thereby negatively regulating the activity of the DDX58 gene product."

Laboratory reports
Below is an outline for sections of a report, paper, manuscript, log book entry, or lab book entry. You may create your own, of course.

H boxes transcription laboratory

by --Marshallsumter (discuss • contribs) 23:13, 18 November 2019 (UTC)

Abstract
A1BG nucleotides on both sides approaching from ZSCAN22 in the negative direction on the negative strand or ZNF497 from the positive direction on the negative strand may provide interaction that transcribes A1BG via an H box outside the core or proximal promoter.

Introduction
Many transcription factors (TFs) may occur upstream and occasionally downstream of the transcription start site (TSS), in this gene's promoter. The following have been examined so far: (1) AGC boxes (GCC boxes), (2) ATA boxes, (3) CAAT boxes, (4) C and D boxes, (5) CAREs (GA responsive complexes), (6) CArG boxes, (7) CENP-B boxes, (8) CGCG boxes, (9) CRE boxes, (10) DREB boxes, (11) EIF4E basal elements (4EBEs), (12) enhancer boxes (E boxes), (13) E2 boxes, (14) Factor II B recognition elements, (15) GAREs (GA responsive complexes), (16) G boxes, (17) GC boxes, (18) GLM boxes, (19) HNF6s, (20) HY boxes, (21) Metal responsive elements (MREs), (22) Motif ten elements (MTEs), (23) Pyrimidine boxes (GA responsive complexes), (24) STAT5s, (25) TACTAAC boxes, (26) TATA boxes, (27) TAT boxes (GA responsive complexes), (28) TATCCAC boxes, (29) W boxes (GA responsive complexes), (30) X boxes and (31) Y boxes.

But, no (3) CAAT box, (7) CENP-B box, (8) CGCG boxes are too close to ZSCAN22, (10) no DREB box, (11) EIF4E basal element, (13) E2 boxes, (15) GARE are too close to ZSCAN22, (16) no G box, (18) GLM box, (22) MTE, (25) TACTAAC box, (27) a TAT box, (28) TATCCAC box, (30) X box, or (31) Y box occur.

Interactions may occur with (1) an AGC (GCC) box, (2) an ATA box, (4) C boxes, a D box, but the other C-box and D-box have not been tested, (5) CAREs, (6) CArG boxes, (9) a CRE box, (12) enhancer boxes, (14) a BREu, (17) GC boxes, (19) HNF6s, (20) HY boxes, (21) an MRE, (23) pyrimidine boxes, (24) STAT5s, (26) TATA boxes outside the core promoter, or (29) W boxes.

Experiments
Regarding hypothesis 1: A1BG is not transcribed by an H box, if an H box is not present in the promoter of A1BG.

The Basic programs (starting with SuccessablesHbox.bas) were written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), including the extended number of nts from 958 to 4445, looking for H boxes, their possible complements and inverses, to test the hypothesis that either consensus sequence 5'-AGAGGA-3' is not present in the promoter of A1BG.

Results
There are two consensus sequences to consider: the specific 3'-ACACCA-5' or 5'-AGAGGA-3' and the more general 5'-ANANNA-3'.

Hypothesis 1
A1BG is not transcribed by an H box.

ZSCAN22 and A1BG

 * 1) There are no H boxes (3'-ACACCA-5') in the core promoter between ZSCAN22 and A1BG.
 * 2) There are no H boxes (3'-AGAGGA-5') in the core promoter between ZSCAN22 and A1BG. But, there is one inverse and its complement 3'-AGGAGA-5' at 4428.
 * 3) There are no H boxes (3'-ANANNA-5') in the core promoter between ZSCAN22 and A1BG. But, there is one inverse and its complement 3'-AGGAGA-5' at 4428.
 * 4) There are no H boxes (3'-ACACCA-5') in the proximal promoter between ZSCAN22 and A1BG.
 * 5) There are no H boxes (3'-AGAGGA-5') in the proximal promoter between ZSCAN22 and A1BG.
 * 6) There is one H box (3'-ANANNA-5'): negative direction, negative strand, 3'-ACACGA-5' at 4402 in the proximal promoter between ZSCAN22 and A1BG. But, on the positive strand in the negative direction there are 16: 3'-ACAAAA-5' at 4216, 3'-AAAAAA-5' at 4218, 3'-AAAATA-5' at 4220, 3'-AAATAA-5' at 4221, 3'-ATAATA-5' at 4223, 3'-AAAAAA-5' at 4378, 3'-AAAAGA-5' at 4380, 3'-AAAGAA-5' at 4381, 3'-AGAAAA-5' at 4383, 3'-AAAAAA-5'at 4385, 3'-AAAAGA-5' at 4387, 3'-AAAGAA-5' at 4388, 3'-AGAAAA-5' at 4390, 3'-AAAAGA-5' at 4392, 3'-AAAGAA-5' at 4393, and 3'-AGAAAA-5' at 4395, with their complements on the negative strand, negative direction.
 * 7) There are three H boxes after nucleotide number 2460 in the negative strand and negative direction of the distal promoter: 3'-ACACCA-5' at 2659, 3'-ACACCA-5' at 3187, and 3'-ACACCA-5' at 3811.
 * 8) There is one H box inverse complement, negative strand, negative direction of the distal promoter: 3'-TGGTGT-5' (3'-ACCACA-5') at 3764.
 * 9) There are three H boxes in the distal promoter on the positive strand in the negative direction: 3'-AGAGGA-5' at 3387, 3'-AGAGGA-5' at 3638, and 3'-AGAGGA-5' at 3675.
 * 10) There is one inverse H box and its complement 3'-AGGAGA-5' at 3790.
 * 11) Regarding 3'-ANANNA-5', on the negative side, negative direction of the distal promoter, there are 13 H boxes: 3'-ACATCA-5' at 2541, 3'-ACACCA-5' at 2659, 3'-ACATTA-5' at 2675, 3'-ATAAAA-5' at 2853, 3'-AAAGTA-5' at 2886, 3'-ACATTA-5' at 3064, 3'-AGATGA-5' at 3159, 3'-ACACCA-5' at 3187, 3'-AGAAGA-5' at 3554, 3'-AGACGA-5' at 3707, 3'-ACACCA-5' at 3811, 3'-ACATTA-5' at 3973, and 3'-ACATCA-5' at 4124.
 * 12) On the positive strand, negative direction of the distal promoter, there are 122 H boxes: 3'-AAAAAA-5' at 2461, 3'-AAAAAA-5' at 2462, 3'-AAAAAA-5' at 2463, 3'-AAAAAA-5' at 2464, 3'-AAAAAA-5' at 2465, 3'-AAAAAA-5' at 2466, 3'-AAAAAA-5' at 2467, 3'-AAAAAA-5' at 2468, 3'-AAAAAA-5' at 2469, 3'-AAAAAA-5' at 2470, 3'-AAAGCA-5' at 2473, 3'-AAAGCA-5' at 2479, 3'-AAACAA-5' at 2484, 3'-AAACAA-5' at 2488, 3'-ACAAAA-5' at 2490, 3'-ATAGTA-5' at 2500, 3'-AGAAAA-5' at 2506, 3'-AAAACA-5' at 2508, 3'-AAACAA-5' at 2509, 3'-AGACCA-5' at 2599, 3'-ATACAA-5' at 2642, 3'-ACAAAA-5' at 2644, 3'-AAATCA-5' at 2648, 3'-ACAGGA-5' at 2690, 3'-AAATCA-5' at 2749, 3'-AGAGCA-5' at 2781, 3'-AAAAGA-5' at 2798, 3'-AAAGAA-5' at 2799, 3'-AAAGAA-5' at 2803, 3'-AGAAAA-5' at 2805, 3'-AAAAGA-5' at 2807, 3'-AGAGAA-5' at 2810, 3'-AGAAGA-5' at 2812, 3'-AGAAAA-5' at 2815, 3'-AAAAAA-5' at 2817, 3'-AAAAGA-5' at 2819, 3'-AAAGAA-5' at 2820, 3'-AGAAAA-5' at 2822, 3'-AAAAGA-5' at 2824, 3'-AGAGAA-5' at 2827, 3'-AGAAGA-5' at 2829, 3'-AGAAAA-5' at 2832, 3'-AAAAAA-5' at 2834, 3'-AAAAGA-5' at 2836, 3'-AAAGAA-5' at 2837, 3'-AGAAAA-5' at 2839, 3'-AAAACA-5' at 2841, 3'-AAACAA-5' at 2842, 3'-AAAATA-5' at 2868, 3'-ATATAA-5' at 2873, 3'-AAAAAA-5' at 2929, 3'-ACATCA-5' at 2941, 3'-ACATTA-5' at 2951, 3'-AAACCA-5' at 2971, 3'-AAAATA-5' at 3012, 3'-AAATAA-5' at 3013, 3'-AAAAAA-5' at 3026, 3'-AAACTA-5' at 3029, 3'-AGACCA-5' at 3122, 3'-AAAACA-5' at 3166, 3'-ACATAA-5' at 3169, 3'-ATAAAA-5' at 3171, 3'-AAATTA-5' at 3175, 3'-AGATCA-5' at 3277, 3'-ACAAGA-5' at 3307, 3'-AGAGCA-5' at 3310, 3'-AAAACA-5' at 3329, 3'-AAACAA-5' at 3330, 3'-AAATAA-5' at 3334, 3'-AAACAA-5' at 3338, 3'-ACAAGA-5' at 3340, 3'-AGAAAA-5' at 3343, 3'-AAACCA-5' at 3365, 3'-AGAGGA-5' at 3387, 3'-ACATCA-5' at 3394, 3'-AGAGAA-5' at 3406, 3'-ACATCA-5' at 3415, 3'-ACATTA-5' at 3436, 3'-ATATTA-5' at 3454, 3'-ATATTA-5' at 3468, 3'-AAACCA-5' at 3484, 3'-AGATCA-5' at 3489, 3'-AAAACA-5' at 3511, 3'-ACACAA-5' at 3514, 3'-ATAATA-5' at 3538, 3'-ACAAGA-5' at 3635, 3'-AGAGGA-5' at 3638, 3'-AAAGAA-5' at 3666, 3'-AGAACA-5' at 3668, 3'-AGAGGA-5' at 3675, 3'-ACAAGA-5' at 3759, 3'-AGACCA-5' at 3762, 3'-ACAAAA-5' at 3767, 3'-AGAGCA-5' at 3913, 3'-AGATGA-5' at 3920, 3'-AGACCA-5' at 4031, 3'-ACAAAA-5' at 4066, 3'-AAAAAA-5' at 4068, 3'-AAAATA-5' at 4070, 3'-AAATAA-5' at 4071, 3'-AAATAA-5' at 4075, 3'-ATAATA-5' at 4077, 3'-ATAGAA-5' at 4080, 3'-AAAGAA-5' at 4084, 3'-AGAAAA-5' at 4086, 3'-AGACAA-5' at 4182, 3'-ACAAAA-5' at 4216, 3'-AAAAAA-5' at 4218, 3'-AAAATA-5' at 4220, 3'-AAATAA-5' at 4221, 3'-ATAATA-5' at 4223, 3'-AAAAAA-5' at 4378, 3'-AAAAGA-5' at 4380, 3'-AAAGAA-5' at 4381, 3'-AGAAAA-5' at 4383, 3'-AAAAAA-5' at 4385, 3'-AAAAGA-5' at 4387, 3'-AAAGAA-5' at 4388, 3'-AGAAAA-5' at 4390, 3'-AAAAGA-5' at 4392, 3'-AAAGAA-5' at 4393, and 3'-AGAAAA-5' at 4395.

ZNF497 and A1BG

 * 1) There are no H boxes (3'-ACACCA-5') in the core promoter between ZNF497 and A1BG.
 * 2) There are no H boxes (3'-AGAGGA-5') in the core promoter between ZNF497 and A1BG.
 * 3) There are no H boxes (3'-ANANNA-5') in the core promoter between ZNF497 and A1BG. But, there is an inverse and its complement 3'-AGGACA-5' at 4252. And, there is one after the TSS 3'-AGAGAA-5' at 4387, plus 3'-AGTACA-5' at 4365, 3'-ACCAGA-5' at 4380, 3'-AAGAGA-5' at 4386, 3'-ACGACA-5' at 4392 and their complements after the TSS.
 * 4) There are no H boxes (3'-ACACCA-5') in the proximal promoter between ZNF497 and A1BG.
 * 5) There are no H boxes (3'-AGAGGA-5') in the proximal promoter between ZNF497 and A1BG.
 * 6) There is one H box (3'-ANANNA-5'): 3'-AGAGAA-5' at 4387 in the proximal promoter, negative strand, positive direction, between ZNF497 and A1BG. But, there are four: 3'-TCATGT-5' at 4365, 3'-TGGTCT-5' at 4380, 3'-TTCTCT-5' at 4386, and 3'-TGCTGT-5' at 4392 and their complements in the positive direction between ZNF497 and A1BG.
 * 7) In the positive direction on the positive strand their is an inverse: 3'-AGGACA-5' at 4252, and its complement.
 * 8) There are two H boxes (3'-ACACCA-5') after nucleotide number 2300 in the negative strand and positive direction of the distal promoter: 3'-ACACCA-5' at 2603 and 3'-ACACCA-5' at 3825.
 * 9) There are two H boxes (3'-ACACCA-5') after nucleotide number 2300 in the positive strand and positive direction of the distal promoter: 3'-ACACCA-5' at 3643, 3'-ACACCA-5' at 3967.
 * 10) There is one H box (3'-AGAGGA-5') after nucleotide number 2300 in the negative strand and positive direction of the distal promoter: 3'-AGAGGA-5' at 2793.
 * 11) There are two H boxes (3'-AGAGGA-5') after nucleotide number 2300 in the positive strand and positive direction of the distal promoter: 3'-AGAGGA-5' at 3302 and 3'-AGAGGA-5' at 4059.
 * 12) There are 25 H boxes (3'-ANANNA-5') after nucleotide number 2300 in the negative strand and positive direction of the distal promoter: 3'-ATACCA-5' at 2591, 3'-ACACCA-5' at 2603, 3'-ATAGAA-5' at 2628, 3'-AAACCA-5' at 2632, 3'-ACACTA-5' at 2637, 3'-ATATAA-5' at 2662, 3'-AGAGCA-5' at 2704, 3'-AGAGGA-5' at 2793, 3'-AAAGGA-5' at 2829, 3'-ACAGAA-5' at 2838, 3'-AAAGAA-5' at 3066, 3'-AGAACA-5' at 3094, 3'-AGAGCA-5' at 3138, 3'-ACAGCA-5' at 3212, 3'-ACAGTA-5' at 3414, 3'-AGATGA-5' at 3476, 3'-ACAGGA-5' at 3572, 3'-AAAGCA-5' at 3599, 3'-ACATGA-5' at 3708, 3'-ACACCA-5' at 3825, 3'-AAAAGA-5' at 3929, 3'-AGAACA-5' at 4068, 3'-AAATGA-5' at 4094, 3'-ACATCA-5' at 4116, 3'-ACATGA-5' at 4154.
 * 13) There are 32 H boxes (3'-ANANNA-5') after nucleotide number 2300 in the positive strand and positive direction of the distal promoter: 3'-AAATAA-5' at 2347, 3'-AAAAAA-5' at 2451, 3'-AAAACA-5' at 2453, 3'-AGACGA-5' at 2976, 3'-AGACCA-5' at 3022, 3'-AGAGAA-5' at 3056, 3'-AGAAGA-5' at 3058, 3'-AGAGGA-5' at 3302, 3'-AGACGA-5' at 3307, 3'-ACAGAA-5' at 3393, 3'-AGAAGA-5' at 3395, 3'-ACAGGA-5' at 3620, 3'-ACACCA-5' at 3643, 3'-AAACCA-5' at 3948, 3'-ACACCA-5' at 3967, 3'-AGAGGA-5' at 4059, 3'-AAAATA-5' at 4122, 3'-AAATCA-5' at 4137, 3'-AAATAA-5' at 4142, 3'-ATATTA-5' at 4168.

Discussions
If H boxes can occur at additional TSS locations, then A1BG can have multiple TSSs.

Hypothesis 1 discussion
There are no H boxes (3'-ACACCA-5') in either core promoter of A1BG. If H boxes must only occur in the core promoter then A1BG is not transcribed by this H box.

There are no H boxes (3'-AGAGGA-5') in either core promoter of A1BG. There is one of these H boxes 3'-AGGAGA-5' at 4428 which if orientation is required points to ZSCAN22 but is much closer to A1BG. This suggests that this H box can activate/suppress ZSCAN22 as part of a distal promoter or can activate/suppress A1BG as an inverse within the core promoter of A1BG. It could also suggest it activates an additional isoform between ZSCAN22 and A1BG that lies between these two that is a snoRNA.

There are no H boxes (3'-ANANNA-5') in the core promoter between ZSCAN22 and A1BG. But, there is one inverse and its complement 3'-AGGAGA-5' at 4428.

There are no H boxes (3'-ANANNA-5') in the core promoter between ZNF497 and A1BG. But, there is an inverse and its complement 3'-AGGACA-5' at 4252. And, there is one after the TSS 3'-AGAGAA-5' at 4387, plus 3'-AGTACA-5' at 4365, 3'-ACCAGA-5' at 4380, 3'-AAGAGA-5' at 4386, 3'-ACGACA-5' at 4392 and their complements after the TSS.

This more general H box opens up additional possibilities for activation/suppression of A1BG or distal promoter activation/suppression of ZNF497. There is also the possibility that a snoRNA isoform lies nearby.

"The DEAD/H box family of RNA helicases has been demonstrated to be involved in virtually all processes that require manipulation of RNA including transcription, pre-mRNA and pre-rRNA processing, RNA export, ribosome assembly and translation [1]." "Examples of DEAD/H box RNA helicases involved in transcription include RNA helicase II (RHII/Gu) and RNA helicase A (RHA/NDHII)." "RHA has been shown to be required for complex formation between the transcriptional co-activator, CREB binding protein (CBP), and RNA polymerase II [7]. Furthermore, different regions of the RNA helicase protein were found to interact with both CBP and RNA polymerase II. The association of RHA with RNA polymerase II was further investigated, and narrowed down to a 50 amino acid stretch, outwith the conserved helicase motifs [7]; this study also showed that RHA could regulate CREB-dependent transcription either through recruitment of Pol II or by ATP-dependent mechanisms. A later study reported that RHA acts as a bridging molecule between the breast tumour specific transcriptional activator, BRCA1 and the RNA polymerase II holoenzyme complex [8]. These reports thus provide clear evidence of a role for RNA helicases as transcription factors."

The H boxes analyzed here are not the same as the H-box 3'-CCTACC-5', in the transcription direction left to right, apparently occurring in the proximal promoter (out to -150 nts) although it is referred to as a core promoter.

By analogy the H box here could occur in the proximal promoter.

There are no H boxes (3'-ACACCA-5') or (3'-AGAGGA-5') in the proximal promoter on either side of A1BG.

There are a great many H boxes on both sides of A1BG in the distal promoters that may be there to enhance/suppress transcription of A1BG or for snoRNAs. There are more on the ZSCAN22 side than on the ZNF497 side.

A search on Google Scholar using A1BG "H box" produced only 4 results with no specific gene that links the two.

Hypothesis 2 discussion
H boxes do occur on both sides of A1BG.

"CRE-containing genes represent ~50% of the Nup98 interacting gene loci". Perhaps about 10 % of genes contain both an H box and a CRE which suggests that interaction can occur and that DExD/ H-box helicases may interact with A1BG near these H boxes.

Most of the possible interactions are not with the correct H box examined here.

A GC box may also be an interactant.

The JAK/STAT pathway may connect with DExD/H-box RNA helicases and interact or transcribed A1BG.

The presence of a W box may assist transcription of A1BG with whatever transcription factors use W boxes.

Conclusions
Nup98 or DHX9 (and perhaps DHX15 and DHX8) may be transcription factors involved in A1BG gene expression using an H box. This may be true for any of the other human genes that involve a transcribed H box described.

A1BG may be transcribed by multiple TSSs associated with H boxes and/or other transcription factors occurring especially in the distal promoters. If multiple TSSs occur even in the distal promoter or closer to the neighboring genes ZSCAN22 and ZNF497, then A1BG could be transcribed.

The key to assisting the recovery of any astronaut now depends on which means of transcription best moderates the effects of microgravity or irradiation. This may require extensive molecular genetic testing. Previous literature analyses may reduce the amount of molecular genetic testing.

Laboratory evaluations
To assess your example, including your justification, analysis and discussion, I will provide such an assessment of my example for comparison and consideration.

Evaluation

No wet chemistry experiments were performed to confirm that Gene ID: 1 may be transcribed from either side using transcription factors in the core, proximal or distal promoters. The NCBI Gene database is generalized, whereas individual human genome testing could demonstrate that A1BG is transcribed from either side using known transcription factors. Sufficient nucleotides have been added to the data sets for the ZNF497 side to confirm likely transcription of A1BG by these known transcription factors.