CGE-2018v6n5 - page 4

Cancer Genetics and Epigenetics 2018, Vol.6, No.5, 33-39
33
Research Report Open Access
Complexity of WGBS Data Caused by Cellular Heterogeneity and Multiple
Cytosine Modifications
Zixu Wang
, Weiming Zhao
College of Basic Medical, Harbin Medical University, Harbin, 150081, China
Corresponding author email
:
Cancer Genetics and Epigenetics, 2018, Vol.6, No.5 doi
:
Received: 09 Oct., 2018
Accepted: 15 Nov., 2018
Published: 30 Nov., 2018
Copyright © 2018
Wang and Zhao, This is an open access article published under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Preferred citation for this article:
Wang Z.X., and Zhao W.M., 2018, Complexity of WGBS data caused by cellular heterogeneity and multiple cytosine modifications, Cancer Genetics and
Epigenetics, 6(5): 33-39 (doi
:
)
Abstract
DNA methylation is an important epigenetic modification that plays an important role in many biological processes such
as transcriptional regulation, gene imprinting, X chromosome inactivation, transposon silencing, and embryonic development. With
the development of next-generation sequencing technology, a large number of high-throughput methylation data are constantly
emerging, and the processing and analysis of these data is an urgent problem to be solved. This review discussed the difficulties and
challenges encountered in the analysis of WGBS methylation data from four levels: (i) Cytosines to Reads: Technology based on
bisulfite conversion; (ii) Reads to Methylation level: BS-seq Sequence alignment; (iii) Methylation level to Region: Characteristics of
the methylation group; (iv) Muticle methylomes: Differential methylation. In particular, we discussed the effects of cellular
heterogeneity, other cytosine modifications at the site, region, and multiple methylation levels on WGBS methylation data.
Keywords
DNAmethylation; Epigenetic modification; WGBS; Differential methylation; Cell heterogeneity
Background
In eukaryotes, DNA methylation refers to the addition of a methyl group to the fifth carbon atom of cytosine (i.e.
5-methylcytosine). In mammals, DNA methylation usually occurs on the dinucleotide sequence CpG (the cytosine
is linked to the guanine by a phosphate bond) and is also referred to as CpG methylation. As an important
epigenetic modification, DNA methylation has been shown to be involved in a variety of biological processes,
such as silencing of transposable elements (Robinson and Smyth, 2008), regulation of gene expression (Feng et al.,
2014), gene imprinting (Zhang et al., 2011), X Chromosomal inactivation (Varshney et al., 2016) and embryonic
development and cell differentiation (Chen et al., 2015). Abnormal DNA methylation changes are found in many
diseases such as cancer. For example, ultra-hypomethylation of proto-oncogenes and ultra-hypermethylation of
tumor suppressor genes promote tumorigenesis (Pennisi, 2013).
Conventional molecular techniques do not distinguish between methylated cytosine and unmethylated cytosine, so
DNA needs to be pre-treated prior to detection of DNA methylation. DNA methylation detection techniques are
classified into three categories according to pretreatment methods, including restriction enzyme digestion, affinity
enrichment, and bisulfite conversion. Compared to restriction enzyme digestion and affinity enrichment, only
regional level methylation data can be generated, and the technology based on bisulfite conversion can accurately
detect the methylation status at the single base level.
Since the next-generation sequencing has been widely used, combined with bisulfite conversion and
high-throughput sequencing is possible to detect the methylation status of almost all cytosines on the whole
genome. In recent years, with the reduction of sequencing cost, Whole Genome Bisulfite Sequencing (WGBS)
(Wong et al., 2016) has been applied in the detection of embryonic development, adult group, disease and other
physiological/of methylation of pathological conditions, such as the Roadmap program for detecting stem cell
lines and in vitro tissues, the BLUEPRINT program for detecting blood cell types and related diseases, and the
TCGA program for detecting various cancer tissues. WGBS data derived from a great deal of samples can be
obtained from the data centers of these biological programs and the GEO database. Compared to gene expression
data, genome-wide methylation data is computationally intensive and contains more complex information.
1,2,3 5,6,7,8,9,10,11,12
Powered by FlippingBook