CpG Island overview | CpG Islands and The Regulation of Transcription

A CpG island is a stretch of DNA with a high quantity of the nucleotides G and C next to one (1) another. The formal definition we use here is a stretch of DNA a minimum (at least) 200 BP long with at least 50% GC content.

What are CpG Islands?

The CG sites OR CpG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases on its 5′ → 3′ direction. CpG sites occur with high frequency in genomic regions called CG islands (or CpG islands). Cytosines in CpG dinucleotides may be methylated to form 5-methylcytosines. Furthermore, Enzymes that add a methyl group are known as DNA methyltransferases. In mammals, 70% to 80% (percent) of CpG cytosines are methylated. Methylating the cytosine within a gene can change its expression, a mechanism that’s part of a bigger field of science studying gene regulation that’s called epigenetics.
CpG islands
CpG is shorthand for 5’—C—phosphate—G—3′, that is, cytosine(C) and guanine(G) respectively separated by only one phosphate group, therefore, the phosphate links any two(2) nucleosides together in DNA. The CpG notation is used to distinguish this single-stranded linear sequence from the CG base-pairing of cytosine and guanine respectively for double-stranded sequences. The CpG notation is to be interpreted as the cytosine being 5 prime to the guanine base. Therefore, CpG should not be confused with GpC, the latter means that guanine is followed by a cytosine in the 5′ → 3′ direction of a single-stranded sequence.

Abstract- CpG island Methylation

CpG islands have evolved from a peculiar sequence overrepresentation of CpGs to being recognized as functionally important parts of the genome that define and regulate promoter regions of vertebrates. CGIs have been resistant to a unified mathematical definition, however recent approaches are getting a lot of precise in distinguishing true CGIs from artifacts like repetitive sequences. CGIs are usually associated with a lack of DNA methylation and may be thought of because of the best predictors for defining active or potentially active promoter regions.

Methylated CGIs play a role in X-inactivation, aberrant methylation patterns in cancer, genomic imprinting, and gene silencing during cell differentiation. therefore, most importantly, it becomes more and more evident that they are essential for fine-tuned regulatory processes by directing gene expression patterns and cell fate, thereby acting as important landmarks of the epigenome.

CpG island and The Regulation of Transcription

CpG islands or CG islands are the regions with a high frequency of CpG sites. Therefore, though the objective definitions for CpG islands are limited, the usual formal definition is a region with at least 200 bp, a GC percentage(%) greater than fifty percent (50%), and an observed-to-expected CpG ratio greater than sixty percent (60%). The “observed-to-expected CpG ratio” can be derived where the observed is calculated such as:
CpG Ratio
Many genes in mammalian genomes have CpG islands related to the beginning of the gene (promoter regions). as a result of this, the presence of a CpG island is used to help in the prediction and annotation of genes.

A 2002 study revised the principles of CpG island prediction to exclude alternative GC-rich genomic sequences like Alu repeats. supported on an extensive search on the whole sequences of human chromosomes 21 and 22, DNA regions greater than 500 bp were found more likely to be the “true” CpG island associated with the 5′ regions of genes, therefore, if they had a GC content greater than fifty-five (55%), and an observed-to-expected CpG ratio of sixty-five (65%).

Therefore, In mammalian genomes, CpG islands are typically 300-3,000 base pairs in length and have been found in or near approximately 40% of promoters of mammalian genes. Over 60% of human genes and almost all house-keeping genes have their promoters embedded in CpG island. Given the frequency of GC two-nucleotide sequences, the number of CpG dinucleotides is much lower than what would be expected.

CpG Dinucleotides

CpG islands are characterized by CpG dinucleotides content of at least sixty percent (60%) of that which might be statistically expected (~4–6%), whereas the rest of the genome has much lower CpG frequency (~1%), therefore, these phenomenon called CG suppression. Unlike CpG island in the coding region of a gene, in most instances.

Furthermore, the CpG sites in the CpG promoter are unmethylated because if the genes are expressed. This observation led to the speculation that CpG methylation of CpG sites in the promoter of a gene may inhibit gene expression. Methylation CpG, along with histone modification, is central to imprinting. Therefore, Most of the methylation differences between tissues, or between normal and cancer samples, occur a short distance from the CpG islands rather than in the islands themselves.
methylation
CpG islands typically occur at or near the transcription start site of genes, particularly housekeeping genes, invertebrates. A-C (cytosine) base followed urgently by a G (guanine) base (a CpG) is rare in vertebrate the CpG DNA because the cytosines in such an arrangement tend to be CpG methylated.

Therefore, methylation will help distinguish the newly synthesized DNA strand from the parent strand, thus, which aids in the final stages of DNA proofreading after duplication. Furthermore, however, over time methylated CpG cytosines tend to turn into thymines that of spontaneous deamination. There is a special enzyme in humans (TDG or Thymine-DNA glycosylase) that specifically replaces T’s from T/G mismatches.

However, due to the rarity of CpGs, therefore, it is theorized to be insufficient effective in preventing a possibly rapid mutation of the dinucleotides. furthermore, The existence of CpG islands is usually explained by the existence of selective forces for relatively high CpG content, or low levels of methylation in because genomic area, perhaps having to do with regulation of gene expression. A 2011 study showed that most CpG island is a result of non-selective forces.

DNA methylation: Silencing, Methylation, Aging and Cancer

CpG islands in promoters: In humans, about 70% of promoters located near the transcription start site of a gene contain a CpG island.

Distal promoter elements also frequently contain CpG islands. for example is the DNA repair gene ERCC1, where the CpG island-containing element is located about 5,400 nucleotides upstream of the transcription start site of the ERCC1 gene. CpG island also occur frequently in promoters for functional noncoding RNAs such as microRNAs

Methylation of CpG Islands Stably Silences Genes

In humans, CpG DNA methylation occurs at the 5 positions of the pyrimidine ring of the cytosine residues within CpG sites to form 5-methylcytosines. Furthermore, the presence of multiple methylated CpG sites in CpG islands of promoters causes stable silencing of genes.
hypothetical evolutionary mechanism

Therefore, the silencing of a gene can be initiated by the other mechanisms, furthermore, these are often followed by methylation of CpG sites in the promoter CpG island due to cause the stable silencing of the gene.

Why are CpG Islands Important?

CpG Islands are most important because they represent the areas of the genome that have for a few reasons been protected from the mutating properties of methylation through evolutionary time (means that which tend to change the G in CpG pairs to an A). furthermore, Often, they point to the presence of an important piece of intergenic DNA, like that found in the promoter regions of genes where transcription factors bind.

How do you determine the genomic locations of clones?

We run a local instance of BLAT with the default switches against the genome that is filtered for the repetitive sequences using RepeatMasker. We also determined the repeat content internally using the RepeatMasker within each clone. therefore, When possible, for the alignments we use a contig derived from our 5’ and 3’ sequencing (this sequencing was done at Canada’s Michael Smith Genome Sciences Centre).

Furthermore, In the absence of a contig, we use the longest of either the 5’ and 3’ reads provided they pass our threshold of quality, that is the number of PHRED scores are greater than 20 in at least 33% of the sequence read. furthermore, If no quality sequences, we have a tendency to the longest Sanger sequence because unfortunately, the Sanger sequence had no corresponding quality scores.

Furthermore, you must be careful when using these sequences. when, If there is no BCGSC sequence data, then we provide the Sanger sequence reads (if present). Therefore, While the Sanger reads are better than nothing, one should also use this information very carefully as we noticed some problems because they occurred during the transfer of the library from Sanger (cross-contaminations).

.
DISCLAIMER: These materials are for academic professional educational purposes only and aren’t a source of medical decision,- making advice. To consult a knowledgeable medical consultation, before taking the medical decision.