Gene transcriptions/Boxes/CArGs

CArG boxes are present in the promoters of smooth muscle cell genes.

Boxes
A "repeating sequence of nucleotides that forms a transcription or a regulatory signal" is a box.

Consensus sequences
"CArG box [CC(A/T)6GG] DNA [consensus] sequences present within the promoters of SMC genes play a pivotal role in controlling their transcription".

The consensus sequence of CC(A/T)6GG is confirmed.

"MADS-box proteins bind to a consensus sequence, the CArG box, that has the core motif CC(A/T)6GG (15)."

"Of the [Flowering Locus C] FLC binding sites, 69% contained at least one CArG-box motif with the core consensus sequence CCAAAAAT(G/A)G and an AAA extension at the 3′ end [...]."

Three "other MADS-box flowering-time regulators, SOC1, SVP, and AGAMOUS-LIKE 24 (AGL24), bind to two different CArG-box motifs at 502 bp (CTAAATATGG) and 287 bp (CAATAATTGG) upstream of the translation start in the SEP3 gene (24), consistent with different specificities for the different MADS-box proteins." These together with the core motif CC(A/T)6GG (15) suggest a more general CArG-box motif of (C(C/A/T)(A/T)6(A/G)G).

Smooth muscle cells
"Serum response factor (SRF) controls [smooth muscle cell] SMC gene transcription via binding to CArG box DNA sequences found within genes that exhibit SMC-restricted expression."

"SMC genes examined in this study display SMC-specific histone modifications at the 5′-CArG boxes."

"The SRF-CArG association is required for transcriptional activation of SMC genes [...] the SMC genes examined in this study display SMC-specific histone modifications at the 5′-CArG boxes. [...] enrichment of H4 and H3 acetylation [...] were relatively low from positions –2,800 to –1,600 in the 5′ region. However, at position –1,600 to –1,200, there was a sharp rise in these modifications, which was increased even further at +400 in the coding region. We observed similar patterns for H3K4dMe and H3 Lys79 di-methylation [...]. SRF, TFIID, and RNA polymerase II displayed enrichments that were consistent with the positions of the CArG boxes, TATA box, and coding region, respectively".

The CArG boxes occur between -400 and -200 nts, between the E boxes and the TC elements.

"Functionally important CArG boxes have been identified in transcriptional regulatory elements controlling expression of sets of myogenic contractile and cytoskeletal proteins (reviewed elsewhere8,25). Of note, in cardiac and skeletal muscle cells, functionally important CArG boxes have been identified in transcriptional regulatory element controlling a relatively limited subset of myofibrillar proteins.26"

"In the nucleus, MRTFs physically associate with SRF, facilitating the binding of SRF to single or dual CArG boxes, activating transcription of genes encoding cytoskeletal and myogenic proteins [...].39,40,53,55,56"

"The binding of SRF to SMC CArG boxes is associated with specific alterations in chromatin structure including the methylation and acetylation of histones.76,79"

"Both PDGF-BB and KLF-4 inhibit SRF binding to CArG boxes downregulating transcription of SMC contractile genes.92"

Gene transcriptions
"SMC-restricted binding of SRF to murine SMC gene CArG box chromatin is associated with patterns of posttranslational histone modifications within this chromatin that are specific to the SMC lineage in culture and in vivo, including methylation and acetylation to histone H3 and H4 residues."

"Ca2+􏰀/calmodulin-dependent protein kinase IV activates cysteine-rich protein 1 through adjacent CRE and CArG elements."

"Smooth muscle-specific transcription is controlled by a multitude of transcriptional regulators that cooperate to drive expression in a temporospatial manner. Previous analysis of the cysteine-rich protein 1 (CRP1/Csrp) gene revealed an intronic enhancer that is sufficient for expression in arterial smooth muscle cells and requires a serum response factor-binding CArG element for activity. The presence of a CArG box in smooth muscle regulatory regions is practically invariant; however, it stands to reason that additional elements contribute to the modulation of transcription in concert with the CArG."

A "conserved cAMP response element (CRE) [...] binds the cAMP response element-binding protein (CREB) and is activated by Ca2+􏰀/calmodulin-dependent protein kinase IV (CaMKIV), but not by CaMKII."

"CaMKIV stimulates CRP1 expression not only through the CRE but also through the CArG box."

A "conserved cyclic AMP-response element (CRE) within the CRP1 gene is critical for enhancer activity. The CRE is an 8-bp motif with the consensus sequence TGACGTCA (34). [...] CRE serves as a transcriptional conduit for cyclic AMP-stimulated processes, but it also responds to a variety of other stimuli, including intracellular Ca2+􏰀 through the activation of Ca2+􏰀/calmodulin-dependent protein kinases (30, 31, 49). The primary factors that bind CRE are the cAMP element-binding protein (CREB) and the related proteins activating transcription factor (ATF)-1 and CRE modulator (CREM). The activity of CREB on the CRE is dependent largely on phosphorylation of a Ser133 residue. This phosphorylation event transforms CREB into a potent transcriptional activator and facilitates interactions with additional regulators, namely, CREB-binding protein (CBP) (31, 49). With respect to function, CREB has been implicated in governing a host of cellular processes and adaptive responses, including differentiation, metabolic changes, cell survival, and proliferation (3, 18, 19, 28, 37, 44, 53)."

The "utilization of two conserved binding sites within the CRP1-5.0 enhancer: a newly identified CRE and the CArG box [...] might serve to amplify a response. Given that these two elements are separated by only 14 bp, CREB and SRF could cooperate by assisting in the recruitment of CBP, both of which bind to CBP’s NH2 terminus."

A1BG samplings
Testing the more general 3'-C(C/A/T)(A/T)6(A/G)G-5':
 * 1) negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesCArG--.bas, looking for 3'-C(C/A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G)G-5', 0,
 * 2) negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesCArG-+.bas, looking for 3'-C(C/A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G)G-5', 0,
 * 3) positive strand in the negative direction is SuccessablesCArG+-.bas, looking for 3'-C(C/A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G)G-5', 2, 3'-CAAAAAAAAG-5', 1399, 3'-CATTAAAAGG-5', 3441,
 * 4) positive strand in the positive direction is SuccessablesCArG++.bas, looking for 3'-C(C/A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G)G-5', 0,
 * 5) complement, negative strand, negative direction is SuccessablesCArGc--.bas, looking for 3'-G(A/G/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/T)C-5', 2, 3'-GTTTTTTTTC-5', 1399, 3'-GTAATTTTCC-5', 3441,
 * 6) complement, negative strand, positive direction is SuccessablesCArGc-+.bas, looking for 3'-G(A/G/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/T)C-5', 0,
 * 7) complement, positive strand, negative direction is SuccessablesCArGc+-.bas, looking for 3'-G(A/G/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/T)C-5', 0,
 * 8) complement, positive strand, positive direction is SuccessablesCArGc++.bas, looking for 3'-G(A/G/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/T)C-5', 0,
 * 9) inverse complement, negative strand, negative direction is SuccessablesCArGci--.bas, looking for 3'-C(C/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G/T)G-5', 0,
 * 10) inverse complement, negative strand, positive direction is SuccessablesCArGci-+.bas, looking for 3'-C(C/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G/T)G-5', 0,
 * 11) inverse complement, positive strand, negative direction is SuccessablesCArGci+-.bas, looking for 3'-C(C/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G/T)G-5', 0,
 * 12) inverse complement, positive strand, positive direction is SuccessablesCArGci++.bas, looking for 3'-C(C/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/G/T)G-5', 0,
 * 13) inverse, negative strand, negative direction, is SuccessablesCArGi--.bas, looking for 3'-G(A/G)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/A/T)C-5', 0,
 * 14) inverse, negative strand, positive direction, is SuccessablesCArGi-+.bas, looking for 3'-G(A/G)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/A/T)C-5', 0,
 * 15) inverse, positive strand, negative direction, is SuccessablesCArGi+-.bas, looking for 3'-G(A/G)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/A/T)C-5', 0,
 * 16) inverse, positive strand, positive direction, is SuccessablesCArGi++.bas, looking for 3'-G(A/G)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(C/A/T)C-5', 0.

Actins
"Positively acting, rate-limiting regulatory factors that influence tissue-specific expression of the human cardiac α-actin gene in a mouse muscle cell line are shown by in vivo competition and gel mobility-shift assays to bind to upstream regions of its promoter but to neither vector DNA nor a β-globin promoter. Although the two binding regions are distinctly separated, each corresponds to a cis region required for muscle-specific transcriptional stimulation, and each contains a core CC(A+T-rich)6GG sequence (designated CArG box), which is found in the promoter regions of several muscle-associated genes. Each site has an apparently different binding affinity for trans-acting factors, which may explain the different transcriptional stimulation activities of the two cis regions. [The] two CArG box regions are responsible for muscle-specific transcriptional activity of the cardiac α-actin gene through a mechanism that involves their binding of a positive trans-acting factor in muscle cells."

"SRF binds to an A/T-rich sequence (CCWWWWWWGG) that has been designated as the CArG box.10–12 CArG boxes were originally identified in transcriptional regulatory elements controlling expression of a set of growth- or serum-responsive genes including c-fos and egr-1.13,14 Subsequently, CArG boxes were identified in transcriptional regulatory elements controlling expression of a subset of genes encoding myogenic contractile and cytoskeletal proteins including α-cardiac actin, smooth muscle (SM)-α-actin, α-skeletal actin, and SM22α.15–19"

Early growth responses
"Exposure of human HL-525 cells to x-rays was associated with increases in EGR1 mRNA levels. Nuclear run-on assays showed that this effect is related at least in part to activation of EGR1 gene transcription. Sequences responsive to ionizing radiation-induced signals were determined by deletion analysis of the EGR1 promoter. The results demonstrate that x-ray inducibility of the EGR1 gene is conferred by a region containing six serum response or CC(A+T-rich)6GG (CArG) motifs. Further analysis confirmed that the region encompassing the three distal or upstream CArG elements is functional in the x-ray response. Moreover, this region conferred x-ray inducibility to a minimal thymidine kinase gene promoter. Taken together, these results indicate that ionizing radiation induces EGR1 transcription through CArG elements."

Myocardins
The "promyogenic SRF [SRF GeneID: 6722] coactivator myocardin [MYOCD GeneID: 93649] increased SRF association with methylated histones and CArG box chromatin during activation of SMC gene expression. [...] myocardin/SRF complexes physically interact with H3K4dMe and that the interaction of SRF with CArG box chromatin and H3K4dMe is sensitive to expression levels of myocardin."

Kruppel-like factor 4
The "myogenic repressor Kruppel-like factor 4 recruited histone H4 deacetylase activity to SMC genes and blocked SRF association with methylated histones and CArG box chromatin during repression of SMC gene expression. [...] deacetylation of histone H4 coupled with loss of SRF binding during suppression of SMC differentiation in response to vascular injury. [...] KLF4 can bind to evolutionarily conserved TGF-β [control element] (TCE) DNA sequences adjacent to CArG boxes of SM gene promoters"

Epigenomes
"SMC-selective epigenetic control of SRF binding to chromatin plays a key role in regulation of SMC gene expression in response to pathophysiological stimuli in vivo."

Histone modifications in SMCs include H3K4dMe, H3 Lys79 di-methylation, H3 Lys9 acetylation, H4Ac, and SRF binding.

MADS boxes
"RIN [Ripening Inhibitor] binds to DNA sequences known as the CA/T-rich-G (CArG) box, which is the general target of MADS box proteins (Ito et al., 2008)."

Human genes
An "interaction between serum response factor (SRF)1 and the CArG box has been identified as a core machinery in the transcription of several muscle-specific genes, including the skeletal 􏰀𝛂-actin (8), caldesmon (9), cardiac 􏰀𝛂-actin (10), 􏰀𝛂1 integrin (11), SM22􏰀𝛂 (12), smooth muscle myosin heavy chain (13), smooth muscle 𝛂􏰀-actin (14), calponin (15), atrial natriuretic factor (16), and 𝛃􏰁-tropomyosin (17) genes."

Actin genes
Gene ID: 58 is ACTA1 actin alpha 1, skeletal muscle. "The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause a variety of myopathies, including nemaline myopathy, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects with manifestations such as hypotonia."

Gene ID: 59 is ACTA2 actin alpha 2, smooth muscle. "This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, integrity, and intercellular signaling. The encoded protein is a smooth muscle actin that is involved in vascular contractility and blood pressure homeostasis. Mutations in this gene cause a variety of vascular diseases, such as thoracic aortic disease, coronary artery disease, stroke, and Moyamoya disease, as well as multisystemic smooth muscle dysfunction syndrome."
 * 1) NP_001135417.1 actin, aortic smooth muscle. Transcript Variant: This variant (1) represents the longest transcript. Variants 1 and 2 encode the same protein. Variants 1-3 encode the same protein.
 * 2) NP_001307784.1 actin, aortic smooth muscle. Transcript Variant: This variant (3) differs in the 5' UTR compared to variant 1. Variants 1-3 encode the same protein.
 * 3) NP_001604.1 actin, aortic smooth muscle. Transcript Variant: This variant (2) differs in the 5' UTR compared to variant 1. Variants 1-3 encode the same protein.

Gene ID: 70 is ACTC1 actin alpha cardiac muscle 1. "Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC)."

Gene ID: 800 is CALD1 caldesmon 1. "This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms."
 * 1) NP_004333.1 caldesmon isoform 2: Transcript Variant: This variant (2) uses an alternate in-frame splice site and lacks an alternate in-frame exon in the central coding region, compared to variant 1. It is mainly expressed in non-muscle tissues or cells. The resulting isoform (2, also known as WI-38 l-CaD II) lacks an internal region, compared to isoform 1. pfam02029 Location:267 → 525, Caldesmon; Caldesmon.
 * 2) NP_149129.2 caldesmon isoform 1: Transcript Variant: This variant (1) encodes the longest isoform (1, also known as aorta h-CaD). It is predominantly expressed in smooth muscle tissues. pfam02029, Location:517 → 780, Caldesmon; Caldesmon.
 * 3) NP_149130.1 caldesmon isoform 4: Transcript Variant: This variant (4) differs in the 5' UTR and 5' coding region, and uses an alternate in-frame splice site in the central coding region, compared to variant 1. It is mainly expressed in non-muscle tissues or cells. The resulting isoform (4, also known as HeLa l-CaD I) contains a distinct N-terminus and is shorter than isoform 1. pfam02029 Location:282 → 545, Caldesmon; Caldesmon.
 * 4) NP_149131.1 caldesmon isoform 5: Transcript Variant: This variant (5) differs in the 5' UTR and 5' coding region, and uses an alternate in-frame splice site and lacks an alternate in-frame exon in the central coding region, compared to variant 1. It is mainly expressed in non-muscle tissues or cells. The resulting isoform (5, also known as HeLa l-CaD II) has a distinct N-terminus and is shorter than isoform 1. pfam02029 Location:256 → 519, Caldesmon; Caldesmon.
 * 5) NP_149347.2 caldesmon isoform 3: Transcript Variant: This variant (3) uses two alternate in-frame splice sites in the central coding region, compared to variant 1. It is mainly expressed in non-muscle tissues or cells. The resulting isoform (3, also known as WI-38 l-CaD I) is shorter than isoform 1. pfam02029 Location:288 → 550, Caldesmon; Caldesmon.
 * 6) XP_016868139.1 caldesmon isoform X1: pfam02029 Location:517 → 779, Caldesmon; Caldesmon.
 * 7) XP_024302710.1 caldesmon isoform X2: pfam02029 Location:293 → 551, Caldesmon; Caldesmon.
 * 8) XP_024302711.1 caldesmon isoform X4: pfam02029 Location:267 → 524, Caldesmon; Caldesmon.
 * 9) XP_016868141.1 caldesmon isoform X3: pfam02029 Location:267 → 525, Caldesmon; Caldesmon.
 * 10) XP_016868143.1  caldesmon isoform X5.
 * 11) XR_002956488.1 RNA Sequence.
 * 12) XR_001744877.2 RNA Sequence.
 * 13) XR_002956489.1 RNA Sequence.
 * 14) XR_002956490.1 RNA Sequence.
 * 15) XR_001744880.2 RNA Sequence.
 * 16) XR_927535.2 RNA Sequence.
 * 17) XR_927541.2 RNA Sequence.
 * 18) XR_927537.3 RNA Sequence.
 * 19) XR_001744879.2 RNA Sequence.
 * 20) XR_927542.3 RNA Sequence.
 * 21) XR_001744881.2 RNA Sequence.

Gene ID: 6876 is TAGLN transgelin aka SM22; SMCC; TAGLN1; WS3-10; SM22-alpha. "This gene encodes a shape change and transformation sensitive actin-binding protein which belongs to the calponin family. It is ubiquitously expressed in vascular and visceral smooth muscle, and is an early marker of smooth muscle differentiation. The encoded protein is thought to be involved in calcium-independent smooth muscle contraction. It acts as a tumor suppressor, and the loss of its expression is an early event in cell transformation and the development of some tumors, coinciding with cellular plasticity. The encoded protein has a domain architecture consisting of an N-terminal calponin homology (CH) domain and a C-terminal calponin-like (CLIK) domain. Mice with a knockout of the orthologous gene are viable and fertile but their vascular smooth muscle cells exhibit alterations in the distribution of the actin filament and changes in cytoskeletal organization."
 * 1) NP_001001522.1 transgelin. Transcript Variant: This variant (1) represents the longer transcript. Variants 1 and 2 both encode the same protein. cd00014 Location:25 → 137, CH; Calponin homology domain; actin-binding domain which may be present as a single copy or in tandem repeats (which increases binding affinity). The CH domain is found in cytoskeletal and signal transduction proteins, including actin-binding proteins like spectrin, alpha-actinin, dystrophin, utrophin, and fimbrin, proteins essential for regulation of cell shape (cortexillins), and signaling proteins (Vav). pfam00402 Location:175 → 198 Calponin; Calponin family repeat.
 * 2) NP_003177.2 transgelin. Transcript Variant: This variant (2) differs in the 5' UTR compared to variant 1. Variants 1 and 2 both encode the same protein. cd00014 Location:25 → 137, CH; Calponin homology domain; actin-binding domain which may be present as a single copy or in tandem repeats (which increases binding affinity). The CH domain is found in cytoskeletal and signal transduction proteins, including actin-binding proteins like spectrin, alpha-actinin, dystrophin, utrophin, and fimbrin, proteins essential for regulation of cell shape (cortexillins), and signaling proteins (Vav). pfam00402 Location:175 → 198 Calponin; Calponin family repeat.

Atrial natriuretic factor genes
Gene ID: 4878 is NPPA natriuretic peptide A. "The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1."

Gene ID: 4879 is NPPB natriuretic peptide B. "This gene is a member of the natriuretic peptide family and encodes a secreted protein which functions as a cardiac hormone. The protein undergoes two cleavage events, one within the cell and a second after secretion into the blood. The protein's biological actions include natriuresis, diuresis, vasorelaxation, inhibition of renin and aldosterone secretion, and a key role in cardiovascular homeostasis. A high concentration of this protein in the bloodstream is indicative of heart failure. The protein also acts as an antimicrobial peptide with antibacterial and antifungal activity. Mutations in this gene have been associated with postmenopausal osteoporosis."

Gene ID: 4880 is NPPC natriuretic peptide C. "This gene encodes a preproprotein that is proteolytically processed to generate multiple protein products. These products include the cardiac natriuretic peptides CNP-53, CNP-29 and CNP-22, which belong to the natriuretic family of peptides. The encoded peptides exhibit vasorelaxation activity in laboratory animals and elevated levels of CNP-22 have been observed in the plasma of chronic heart failure patients."

Calponin genes
Gene ID: 1264 is CNN1 calponin 1.
 * 1) NP_001290.2 calponin-1 isoform 1. Transcript Variant: This variant (1) represents the longer isoform (1).
 * 2) NP_001295270.1 calponin-1 isoform 2. Transcript Variant: This variant (2) differs in the 5' UTR and uses an alternate exon in the 5' coding region, which results in use of a downstream start codon compared to variant 1. It encodes isoform 2, which has a shorter N-terminus than isoform 1. Variants 2 and 3 encode the same isoform.
 * 3) NP_001295271.1 calponin-1 isoform 2. Transcript Variant: This variant (3) differs in the 5' UTR and uses an alternate exon in the 5' coding region, which results in use of a downstream start codon compared to variant 1. It encodes isoform 2, which has a shorter N-terminus than isoform 1. Variants 2 and 3 encode the same isoform.

Gene ID: 1265 is CNN2 calponin 2. "The protein encoded by this gene, which can bind actin, calmodulin, troponin C, and tropomyosin, may function in the structural organization of actin filaments. The encoded protein could play a role in smooth muscle contraction and cell adhesion. Several pseudogenes of this gene have been identified, and are present on chromosomes 1, 2, 3, 6, 9, 11, 13, 15, 16, 21 and 22. Alternative splicing results in multiple transcript variants encoding different isoforms."
 * 1) NP_001290428.1 calponin-2 isoform c. Transcript Variant: This variant (3) uses two alternate in-frame splice sites in the central coding region, compared to variant 4. The encoded isoform (c) is shorter than isoform d.
 * 2) NP_001290430.1 calponin-2 isoform d. Transcript Variant: This variant (4) represents the longest transcript and encodes the longest isoform (d).
 * 3) NP_004359.1 calponin-2 isoform a. Transcript Variant: This variant (1) uses an alternate in-frame splice site in the central coding region, compared to variant 4. The encoded isoform (a) is shorter than isoform d.
 * 4) NP_958434.1  calponin-2 isoform b. Transcript Variant: This variant (2) lacks an alternate in-frame exon in the 3' coding region, compared to variant 4. The encoded isoform (b) is shorter than isoform d.

Gene ID: 1266 is CNN3 calponin 3, acidic. "This gene encodes a protein with a markedly acidic C terminus; the basic N-terminus is highly homologous to the N-terminus of a related gene, CNN1. Members of the CNN gene family all contain similar tandemly repeated motifs. This encoded protein is associated with the cytoskeleton but is not involved in contraction."
 * 1) NP_001272984.1 calponin-3 isoform 2. Transcript Variant: This variant (2) lacks an in-frame exon in the central coding region compared to variant 1. The encoded isoform (2) is shorter than isoform 1.
 * 2) NP_001272985.1 calponin-3 isoform 3. Transcript Variant: This variant (3) differs in the 5' UTR and lacks a portion of the 5' coding region compared to variant 1. These differences cause translation initiation at a downstream start codon compared to variant 1. The encoded isoform (3) has a shorter N-terminus compared to isoform 1.
 * 3) NP_001830.1  calponin-3 isoform 1. Transcript Variant: This variant (1) represents the longest transcript and encodes the longest isoform (1).

Fos genes
Gene ID: 2353 is FOS Fos proto-oncogene, AP-1 transcription factor subunit. "The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. In some cases, expression of the FOS gene has also been associated with apoptotic cell death." "Serum response factor and the (CC(A/T)6GG) (CArG) box interact to promote the transcription of c-fos and muscle genes".
 * 1) NP_005243.1 proto-oncogene c-Fos, cd14721 Location:147 → 200, bZIP_Fos; Basic leucine zipper (bZIP) domain of the oncogene Fos (Fos): a DNA-binding and dimerization domain.

Integrin genes
Gene ID: 3672 is ITGA1 integrin subunit alpha 1, aka VLA1; CD49a. "This gene encodes the alpha 1 subunit of integrin receptors. This protein heterodimerizes with the beta 1 subunit to form a cell-surface receptor for collagen and laminin. The heterodimeric receptor is involved in cell-cell adhesion and may play a role in inflammation and fibrosis. The alpha 1 subunit contains an inserted (I) von Willebrand factor type I domain which is thought to be involved in collagen binding."
 * 1) NP_852478.1 integrin alpha-1 precursor, smart00191 Location:568 → 621, Int_alpha; Integrin alpha (beta-propellor repeats), cd01469 Location:171 → 351, vWA_integrins_alpha_subunit; Integrins are a class of adhesion receptors that link the extracellular matrix to the cytoskeleton and cooperate with growth factor receptors to promote cell survival, cell cycle progression and cell migration. Integrins consist of an alpha and a beta sub-unit. Each sub-unit has a large extracellular portion, a single transmembrane segment and a short cytoplasmic domain. The N-terminal domains of the alpha and beta subunits associate to form the integrin headpiece, which contains the ligand binding site, whereas the C-terminal segments traverse the plasma membrane and mediate interaction with the cytoskeleton and with signalling proteins. The VWA domains present in the alpha subunits of integrins seem to be a chordate specific radiation of the gene family being found only in vertebrates. They mediate protein-protein interactions. pfam08441 Location:664 → 1056, Integrin_alpha2; Integrin alpha."

MADS box genes
Gene ID: 4205 is MEF2A myocyte enhancer factor 2A. "The protein encoded by this gene is a DNA-binding transcription factor that activates many muscle-specific, growth factor-induced, and stress-induced genes. The encoded protein can act as a homodimer or as a heterodimer and is involved in several cellular processes, including muscle development, neuronal differentiation, cell growth control, and apoptosis. Defects in this gene could be a cause of autosomal dominant coronary artery disease 1 with myocardial infarction (ADCAD1). Several transcript variants encoding different isoforms have been found for this gene."
 * 1) NP_001124398.1  myocyte-specific enhancer factor 2A isoform 2. Transcript Variant: This variant (2) lacks an in-frame coding exon and a 5' non-coding exon, compared to transcript variant 6. These differences result in a shorter isoform (2), compared to isoform 5. Variants 2 and 5 both encode isoform 2. MEF2 (myocyte enhancer factor 2)-like/Type II subfamily of MADS ( MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptional regulators. Binds DNA and exists as hetero and homo-dimers. Differs from SRF-like/Type I subgroup mainly in position of the alpha helix responsible for the dimerization interface. Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 2) NP_001124399.1 myocyte-specific enhancer factor 2A isoform 3. Transcript Variant: This variant (3) lacks an in-frame 5' coding exon and a 5' non-coding exon, compared to transcript variant 6. These differences result in a shorter isoform (3), compared to isoform 5. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 3) NP_001124400.1 myocyte-specific enhancer factor 2A isoform 4. Transcript Variant: This variant (4) lacks multiple, in-frame coding exons, compared to transcript variant 6. These differences result in a shorter isoform (4), compared to isoform 5. The 5' UTR of this transcript variant is undefined. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 4) NP_001165365.1  myocyte-specific enhancer factor 2A isoform 2. Transcript Variant: This variant (5) lacks an in-frame coding exon and contains an additional 5' non-coding exon, compared to transcript variant 6. These differences result in a shorter isoform (2), compared to isoform 5. Variants 2 and 5 both encode isoform 2. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 5) NP_001306135.1 myocyte-specific enhancer factor 2A isoform 5. Transcript Variant: This variant (6) encodes the longest isoform (5). MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 6) NP_001339543.1  myocyte-specific enhancer factor 2A isoform 2. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 7) NP_001339544.1 myocyte-specific enhancer factor 2A isoform 2. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 8) NP_001339545.1 myocyte-specific enhancer factor 2A isoform 5. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 9) NP_001339546.1 myocyte-specific enhancer factor 2A isoform 1. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 10) NP_001339547.1 myocyte-specific enhancer factor 2A isoform 6. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 11) NP_001352130.1  myocyte-specific enhancer factor 2A isoform 7. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 12) NP_001352131.1  myocyte-specific enhancer factor 2A isoform 7. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 13) NP_001352132.1  myocyte-specific enhancer factor 2A isoform 8. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 14) NP_001352133.1  myocyte-specific enhancer factor 2A isoform 8. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 15) NP_001352134.1  myocyte-specific enhancer factor 2A isoform 5. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 16) NP_001352135.1  myocyte-specific enhancer factor 2A isoform 6. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 17) NP_001352136.1  myocyte-specific enhancer factor 2A isoform 1. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 18) NP_001352137.1  myocyte-specific enhancer factor 2A isoform 2. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 19) NP_001352138.1  myocyte-specific enhancer factor 2A isoform 3. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 20) NP_001352139.1  myocyte-specific enhancer factor 2A isoform 10. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 21) NP_001352140.1  myocyte-specific enhancer factor 2A isoform 11. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 22) NP_005578.2  myocyte-specific enhancer factor 2A isoform 1. Transcript Variant: This variant (1) lacks multiple, in-frame coding exons and uses an alternate coding exon, compared to transcript variant 6. These differences result in a shorter isoform (1), compared to isoform 5. MADS: MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptonal regulators. Binds DNA and exists as hetero and homo-dimers. Composed of 2 main subgroups: SRF-like/Type I and MEF2-like (myocyte enhancer factor 2)/ Type II. These subgroups differ mainly in position of the alpha 2 helix responsible for the dimerization interface; Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.

Gene ID: 4207 is BORCS8-MEF2B BORCS8-MEF2B readthrough, aka MEF2B myocyte enhancer factor 2B. "This gene represents numerous read-through transcripts that span GeneID:729991 and 100271849. Many read-through transcripts are predicted to be nonsense-mediated decay (NMD) candidates, and are thought to be non-coding. Some transcripts are predicted to be capable of translation reinitiation at a downstream AUG, resulting in expression of at least one isoform of myocyte enhancer factor 2B (MEF2B) from this read-through locus. At least one additional MEF2B variant and isoform can be expressed from a downstream promoter, and is annotated on GeneID:100271849."
 * 1) NP_005910.1 myocyte-specific enhancer factor 2B isoform b. Transcript Variant: This variant (1) lacks two alternate exons in the 5' region and one alternate exon in the 3' region, compared to variant 2. This variant is thought to be protein coding because translation can reinitiate at the downstream AUG, resulting in expression of an isoform of MEF2B (geneID:100271849). Isoform b has a shorter and distinct C-terminus, compared to MEF2A isoform a (NP_001139257.1). MEF2 (myocyte enhancer factor 2)-like/Type II subfamily of MADS ( MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptional regulators. Binds DNA and exists as hetero and homo-dimers. Differs from SRF-like/Type I subgroup mainly in position of the alpha helix responsible for the dimerization interface. Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 2) NR_027307.2 RNA Sequence. Transcript Variant: This variant (2) represents the longest transcript. This variant is represented as non-coding because the use of the 5'-most translational start codon, as used in NM_001145784.1, renders the transcript a candidate for nonsense-mediated mRNA decay (NMD).
 * 3) NR_027308.2 RNA Sequence. Transcript Variant: This variant (3) lacks an alternate exon in the 3' region, compared to variant 2. This variant is represented as non-coding because the use of the 5'-most translational start codon, as used in NM_001145784.1, renders the transcript a candidate for nonsense-mediated mRNA decay (NMD).

Gene ID: 4208 is MEF2C myocyte enhancer factor 2C. "This locus encodes a member of the MADS box transcription enhancer factor 2 (MEF2) family of proteins, which play a role in myogenesis. The encoded protein, MEF2 polypeptide C, has both trans-activating and DNA binding activities. This protein may play a role in maintaining the differentiated state of muscle cells. Mutations and deletions at this locus have been associated with severe cognitive disability, stereotypic movements, epilepsy, and cerebral malformation. Alternatively spliced transcript variants have been described."

Gene ID: 4209 is MEF2D myocyte enhancer factor 2D. "This gene is a member of the myocyte-specific enhancer factor 2 (MEF2) family of transcription factors. Members of this family are involved in control of muscle and neuronal cell differentiation and development, and are regulated by class II histone deacetylases. Fusions of the encoded protein with Deleted in Azoospermia-Associated Protein 1 (DAZAP1) due to a translocation have been found in an acute lymphoblastic leukemia cell line, suggesting a role in leukemogenesis. The encoded protein may also be involved in Parkinson disease and myotonic dystrophy. Alternative splicing results in multiple transcript variants."
 * 1) NP_001258558.1 myocyte-specific enhancer factor 2D isoform 2. Transcript Variant: This variant (2) contains an alternate exon and splice site in the 5' UTR, and lacks an internal in-frame exon in the coding region, compared to variant 1. The resulting isoform (2, also known as hMEF2Da0), is shorter than isoform 1. MEF2 (myocyte enhancer factor 2)-like/Type II subfamily of MADS ( MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptional regulators. Binds DNA and exists as hetero and homo-dimers. Differs from SRF-like/Type I subgroup mainly in position of the alpha helix responsible for the dimerization interface. Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 2) NP_005911.1 myocyte-specific enhancer factor 2D isoform 1. Transcript Variant: This variant (1) represents the longer transcript and encodes the longer isoform (1, also known as hMEF2Dab). MEF2 (myocyte enhancer factor 2)-like/Type II subfamily of MADS ( MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptional regulators. Binds DNA and exists as hetero and homo-dimers. Differs from SRF-like/Type I subgroup mainly in position of the alpha helix responsible for the dimerization interface. Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.

Gene ID: 23523 is CABIN1 calcineurin binding protein 1. "Calcineurin plays an important role in the T-cell receptor-mediated signal transduction pathway. The protein encoded by this gene binds specifically to the activated form of calcineurin and inhibits calcineurin-mediated signal transduction. The encoded protein is found in the nucleus and contains a leucine zipper domain as well as several PEST motifs, sequences which confer targeted degradation to those proteins which contain them. Alternative splicing results in multiple transcript variants encoding two different isoforms."
 * 1) NP_001186210.1 calcineurin-binding protein cabin-1 isoform a. Transcript Variant: This variant (1) represents the longest transcript and encodes the longer isoform (a). Both variants 1 and 2 encode the same isoform (a). MEF2 binding.
 * 2) NP_001188358.1 calcineurin-binding protein cabin-1 isoform b. Transcript Variant: This variant (3) differs in the 5' UTR and lacks an alternate in-frame exon compared to variant 1. The resulting isoform (b) has the same N- and C-termini but is shorter compared to isoform a. Mycocyte enhancer factor-2 (MEF2) binding domain of the calcineurin-binding protein cabin-1.
 * 3) NP_036427.1 calcineurin-binding protein cabin-1 isoform a. Transcript Variant: This variant (2) differs in the 5' UTR compared to variant 1. Both variants 1 and 2 encode the same isoform (a). MEF2 binding.

Gene ID: 100271849 is MEF2B myocyte enhancer factor 2B. "The product of this gene is a member of the MADS/MEF2 family of DNA binding proteins. The protein is thought to regulate gene expression, including expression of the smooth muscle myosin heavy chain gene. This region undergoes considerable alternative splicing, with transcripts supporting two non-overlapping loci (GeneID 729991 and 100271849) as well as numerous read-through transcripts that span both loci (annotated as GeneID 4207). Several isoforms of this protein are expressed from either this locus or from some of the read-through transcripts annotated on GeneID 4207."
 * 1) NP_001139257.1  myocyte-specific enhancer factor 2B isoform 1. MEF2 (myocyte enhancer factor 2)-like/Type II subfamily of MADS ( MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptional regulators. Binds DNA and exists as hetero and homo-dimers. Differs from SRF-like/Type I subgroup mainly in position of the alpha helix responsible for the dimerization interface. Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.
 * 2) NP_001354211.1  myocyte-specific enhancer factor 2B isoform 2. MEF2 (myocyte enhancer factor 2)-like/Type II subfamily of MADS ( MCM1, Agamous, Deficiens, and SRF (serum response factor) box family of eukaryotic transcriptional regulators. Binds DNA and exists as hetero and homo-dimers. Differs from SRF-like/Type I subgroup mainly in position of the alpha helix responsible for the dimerization interface. Important in homeotic regulation in plants and in immediate-early development in animals. Also found in fungi.

Myosin heavy chains
Gene ID: 4619 is MYH1 myosin heavy chain 1. "Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development."

Gene ID: 4620 is MYH2 myosin heavy chain 2. "Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified."
 * 1) NP_001093582.1 myosin-2. Transcript Variant: This variant (2) differs in the 5' UTR compared to variant 1. Both variants encode the same protein.
 * 2) NP_060004.3 myosin-2. Transcript Variant: This variant (1) differs in the 5' UTR compared to variant 2. Both variants encode the same protein.

Gene ID: 4621 is MYH3 myosin heavy chain 3. "Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. This gene is a member of the MYH family and encodes a protein with an IQ domain and a myosin head-like domain. Mutations in this gene have been associated with two congenital contracture (arthrogryposis) syndromes, Freeman-Sheldon syndrome and Sheldon-Hall syndrome."

Gene ID: 4622 is MYH4 myosin heavy chain 4.

Gene ID: 4624 is MYH6 myosin heavy chain 6. "Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located approximately 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3."

Gene ID: 4625 is MYH7 myosin heavy chain 7. "Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy."

Gene ID: 4626 is MYH8 myosin heavy chain 8. "Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is predominantly expressed in fetal skeletal muscle. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in trismus-pseudocamptodactyly syndrome."

Gene ID: 4627 is MYH9 myosin heavy chain 9. "This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness."

Gene ID: 4628 is MYH10 myosin heavy chain 10. "This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene."
 * 1) NP_001242941.1 myosin-10 isoform 1. Transcript Variant: This variant (1) represents the longest transcript and encodes the longest isoform (1).
 * 2) NP_001243024.1 myosin-10 isoform 3. Transcript Variant: This variant (3) uses an alternate in-frame splice site in the 5' coding region, and lacks an alternate in-frame exon in the central coding region, compared to variant 1. The encoded isoform (3) is shorter than isoform 1.
 * 3) NP_001362195.1 myosin-10 isoform 4.
 * 4) NP_005955.3 myosin-10 isoform 2. Transcript Variant: This variant (2) lacks an alternate in-frame exon in both the 5' and central coding regions, compared to variant 1. The encoded isoform (2) is shorter than isoform 1.

Gene ID: 4629 is MYH11 myosin heavy chain 11. "The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3' end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified."
 * 1) NP_001035202.1 myosin-11 isoform SM2B. Transcript Variant: This variant (SM2B) represents the longer transcript. It encodes the isoform SM2B.
 * 2) NP_001035203.1 myosin-11 isoform SM1B. Transcript Variant: This variant (SM1B) lacks a segment in the coding region, which leads to a frameshift, compared to variant SM2B. The encoded isoform (SM1B) is longer and varies in the carboxyl terminus, compared to isoform SM2B.
 * 3) NP_002465.1 myosin-11 isoform SM1A. Transcript Variant: This variant (SM1A) lacks two segments in the coding region, compared to variant SM2B. The encoded isoform (SM1A) is shorter and varies in the carboxyl terminus, compared to isoform SM2B.
 * 4) NP_074035.1 myosin-11 isoform SM2A. Transcript Variant: This variant (SM2A) lacks an in-frame segment of the coding region, compared to variant SM2B. It encodes a shorter isoform (SM2A), that is missing an internal segment compared to isoform SM2B.
 * 5) XP_016878739.1 myosin-11 isoform X1.
 * 6) XP_011520804.1 myosin-11 isoform X2.

Gene ID: 4644 is MYO5A myosin VA aka MYH12. "This gene is one of three myosin V heavy-chain genes, belonging to the myosin gene superfamily. Myosin V is a class of actin-based motor proteins involved in cytoplasmic vesicle transport and anchorage, spindle-pole alignment and mRNA translocation. The protein encoded by this gene is abundant in melanocytes and nerve cells. Mutations in this gene cause Griscelli syndrome type-1 (GS1), Griscelli syndrome type-3 (GS3) and neuroectodermal melanolysosomal disease, or Elejalde disease. Multiple alternatively spliced transcript variants encoding different isoforms have been reported, but the full-length nature of some variants has not been determined."
 * 1) NP_000250.3 unconventional myosin-Va isoform 1. Transcript Variant: This variant (1) encodes the longer isoform (1).
 * 2) NP_001135967.2 unconventional myosin-Va isoform 2. Transcript Variant: This variant (2) lacks an in-frame exon in the CDS, resulting in a shorter isoform (2), as compared to variant 1.

Gene ID: 8735 is MYH13 myosin heavy chain 13.

Gene ID: 22989 is MYH15 myosin heavy chain 15.
 * 1) NP_055796.1  myosin-15 precursor.

Gene ID: 57644 is MYH7B myosin heavy chain 7B. "The myosin II molecule is a multi-subunit complex consisting of two heavy chains and four light chains. This gene encodes a heavy chain of myosin II, which is a member of the motor-domain superfamily. The heavy chain includes a globular motor domain, which catalyzes ATP hydrolysis and interacts with actin, and a tail domain in which heptad repeat sequences promote dimerization by interacting to form a rod-like alpha-helical coiled coil. This heavy chain subunit is a slow-twitch myosin. Alternatively spliced transcript variants have been found, but the full-length nature of these variants is not determined."

Gene ID: 79784 is MYH14 myosin heavy chain 14. "This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-14 (MYO14). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene result in one form of autosomal dominant hearing impairment. Multiple transcript variants encoding different isoforms have been found for this gene."
 * 1) NP_001070654.1 myosin-14 isoform 1. Transcript Variant: This variant (1) lacks an alternate in-frame exon in the 5' coding region, compared to variant 3. The resulting isoform (1) lacks an internal segment in the motor domain, compared to isoform 3.
 * 2) NP_001139281.1 myosin-14 isoform 3. Transcript Variant: This variant (3) represents the longest transcript and encodes the longest isoform (3).
 * 3) NP_079005.3 myosin-14 isoform 2. Transcript Variant: This variant (2) lacks two alternate in-frame exons in the 5' coding region, compared to variant 3. The resulting isoform (2) lacks two separate segments in the motor domain, compared to isoform 3.
 * 4) XP_011525622.1 myosin-14 isoform X1.
 * 5) XP_011525623.1 myosin-14 isoform X2.
 * 6) XP_006723449.1 myosin-14 isoform X3.
 * 7) XP_024307489.1 myosin-14 isoform X4.
 * 8) XP_011525625.1 myosin-14 isoform X3.

Smooth muscle kinase genes
Analysis "of the smMLCK promoter revealed that a single CArG box is required for basal promoter activity in smooth muscle and nonmuscle cell types. The smooth and cardiac muscle restricted serum response factor (SRF) coactivator myocardin robustly induced smMLCK expression in 10T1/2 cells, although it increased the activity of the proximal smMLCK promoter only twofold in reporter gene assays. In contrast to SRF and myocardin, GATA-6 repressed the activity of the smMLCK promoter and inhibited smMLCK protein expression in vascular smooth muscle cells. Altogether, these studies indicate that expression of the 130-kDa smMLCK is regulated by a CArG-dependent promoter located within an intron of the mouse mylk gene." The transcription factor binding motifs in the smooth muscle MLCK proximal promoter are -166 CCTTATAAGG (CArG), -141 CCGATATA (GATA), -101 CAAT, -87 ATAAAC (Fox), -74 GGCCGGCCCC (Sp1), +6 ACCCAGCCCC (Sp1), and +60 GGGGGCGGGA (Sp1), with transcription start sites at G+1, +47 A, +54 A, and +118 A.

Gene ID: 4638 is MYLK myosin light chain kinase aka KRP; AAT7; MLCK; MLCK1; MMIHS; MYLK1; smMLCK; MLCK108; MLCK210; MSTP083. "This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3' region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts."
 * 1) NP_001308238.1 myosin light chain kinase, smooth muscle isoform 9.
 * 2) NP_444253.3 myosin light chain kinase, smooth muscle isoform 1. Transcript Variant: This variant (1) is the full-length transcript and encodes the full-length nonmuscle isoform.
 * 3) NP_444254.3  myosin light chain kinase, smooth muscle isoform 2. Transcript Variant: This variant (2) does not utilize exon 11, compared to variant 1, resulting in a shorter protein (isoform 2), compared to isoform 1.
 * 4) NP_444255.3  myosin light chain kinase, smooth muscle isoform 3A. Transcript Variant: This variant (3A) does not utilize exon 30, compared to variant 1, resulting in a shorter protein (isoform 3A), compared to isoform 1.
 * 5) NP_444256.3  myosin light chain kinase, smooth muscle isoform 3B. Transcript Variant: This variant (3B) does not utilize exons 11 and 30, compared to variant 1, resulting in a shorter protein (isoform 3B), compared to isoform 1.
 * 6) NP_444259.1  myosin light chain kinase, smooth muscle isoform 7. Transcript Variant: This variant (7) encodes the shorter isoform of kinase related protein, telokin. The first exon corresponds to intron 30 and the remainder of the transcript corresponds to the last two exons of the gene. It is shorter than variant 8 by one codon at the splicing junction between the first two exons.
 * 7) NP_444260.1  myosin light chain kinase, smooth muscle isoform 8. ranscript Variant: This variant (8) encodes the longer isoform of kinase related protein, telokin. It is longer than variant 7 by one codon at the splicing junction between the first two exons.

Hypotheses

 * 1) A1BG is not transcribed using a CArG box.
 * 2) A CArG box on either side of A1BG may show that it is actively used to transcribe A1BG.

Results
There is a more general CArG box, 3'-CATTAAAAGG-5', at 3441 from ZSCAN22, or -1019 nts from the TSS of A1BG in the distal promoter.

A second more general CArG box, 3'-CAAAAAAAAG-5', at 1399 from ZSCAN22, or -3061 nts from the A1BG TSS may be a CArG box for ZSCAN22 in the negative direction on the positive strand in the distal promoter.