Computational Molecular Biology 2014, Vol. 4, No. 10, 1-17
http://cmb.biopublisher.ca
2
H3K4me3 and largely unchanged H3K4me2
. In contrast,
the dysfunction of H3K4 methyltransferase ATX2 can
lead to decreased H3K4me2 and largely unchanged
H3K4me3
. In the three differential
methylation states of H3K4, trimethylation seems to
be more stable, while mono- and dimethylation are
less stable. JARID1 family includes histone
demethylases for H3K4 trimethylation
, and the conversion from H3K4 trimethylation
to dimethylation or monomethylation is possible.
Mono-, di-, or tri-methyl marks in lysine 4 of histone
H3 are key epigenetic modifications for regulating
gene expression, especially H3K4me3 mark. In
addition, CpG islands (CGIs) enriched with H3K4
methylation are unmethylated to facilitate transcription
. However, different effects may be
associated with mono-, di-, or trimethylation of lysine
residues. Mono- and di-methyl marks of H3K4me are
enriched in intergenic regions such as enhancers
which have indirect regulatory roles on gene
expression
.
Taken together, significant differences of the function
exist for different combinations for H3K4 methylation
markers depending on the number of the methyl
groups, but little is studied on this issue previously.
Significant technological progress has provided
unprecedented resolution for genome-wide histone
modification mapping
. Several
large-scale studies have provided high-resolution
histone modification profiles, the most comprehensive
ones are from Barski and Wang
et al.
in CD4+ T cells
. Based on this
dataset, we aim to study the specific genomic and
other attributes for both four methylation co-localized
marks in lysine residue 4 of histone H3 (H3K4), that
is, mono- and di-methylation (me1me2), mono- and
tri-methylation (me1me3), mono-, di- and tri-methylation
(me1me2me3), lastly, di- and tri-methylation (me2me3),
with single-localized marks for me1, me2 and me3 as
controls. Principally, the number of mapped tags
detected for a particular position is proportional to the
specific modification level of the corresponding
nucleosome. The enriched genomic fragments are
considered as ‘true’ peaks in genomic scale, either span
single nucleosome or multiple nucleosomes.
It is unknown what are the distinctions of underlying
genomic features for co-localized and single-localized
histone methylation modifications. In this study, we
characterize the genomic and functional genomic
features for four H3K4me co-localization types. Some
but all co-localization combinations are more
conserved than single-localization controls at a
higher-than-expected frequency in and out of
transcriptional start sites (TSSs) proximal regions
(TPRs). The proteins encoded by the genes overlapping
co-localized peaks in TPRs have more protein partners
in protein-protein interaction network than those with
single-localized peaks. Moreover, co-localization
types are distinct with respect to functional categories
revealed by Gene Ontology enrichment analysis,
suggesting that genes with similar functions may share
similar H3K4me co-localization patterns. CpG
depletion is more prominent in co-localization related
genes than controls. In addition, AT nucleotide-rich is
a general feature for co-localized H3K4 methylation
regions. Me1me2me3, the triplet version of H3K4me,
is found to be prominently associated with tissue
specificity. Overall, this study represents an important
contribution to the understanding of histone codes
and the role of H3K4me co-localization
in function genomic regulation.
1 Methods
1.1 Datasets
The histone modification profile of lysine 4 in histone
H3 was from Barski et al
.
. It was
the most comprehensive genome-scale profiling of
histone methylation in human. The histone
modification dataset was from human G0/G1 CD4+ T
cells. In their studies, ChIP-sequencing (ChIP-seq)
was used to sequence tags from two ends of genomic
fragments digested from microccocal nuclease (MNase).
The technology is quantitative and cost-effective for
genome-wide histone modification study. Phylogenetic
Conserved Elements (PhastCons) annotation file
(hg18), RefSeq gene annotation and reference
genomic sequences were downloaded from the UCSC
Table Browser
. The phastCons
(pC) score was linearly transformed from [0, 1000] to
[0, 1]. If a ChIP-seq peak has no overlap with phast
Cons data, the conservation value for that peak is zero.