Computational Molecular Biology 2014, Vol. 4, No. 10, 1-17
http://cmb.biopublisher.ca
9
The definition of CGI is based on some cutoffs
including CpG o/e ratio, G+C content and length
. The overlapping rates and peak overlapping
percentages for co-localization and single-localization
peaks were both calculated. Consistent with our
expectation, the CpG content and CpG o/e ratio for
single-localized peaks were all larger than
co-localized peaks. In Table 4, CpG o/e ratio was the
largest for single-localized peaks. As GC-rich is
common in housekeeping genes and GC-poor is
common in TPRs of tissue-specific genes, it was
straightforward to infer that single- and certain
co-localized peaks were general in house-keeping and
tissue-specific TSS-proximal regions, respectively. We
considered that CGIs were enriched in single-localized
peaks, compared with co-localized peaks (Table 5).
We did not observe such a trend for single- and
co-localized peaks when associating overlapping
genes. However, we found the CGI coverage rate
increased along with the accumulation of methyl
groups both in TPRs and non-TPRs, while TPRs
overlapped even more. The observation indicated that
H3K4me3 occupied regions overlapped significantly
with CGIs. Furthermore, the transcriptional patterns of
histone modification combinations were also explored.
As PolII is a good proxy for transcription, we used the
PolII profile from Barski et al
.
to
study the relationship of transcription and H3K4me
localization peaks.
In Figure 5, we found that genes
associated with different combinations of H3K4me
localization were expressed at different levels. Consistent
with expectation, me1 peaks were least associated with
PolII level, me1me3 and me3 were the most (not
significant between), while me2 peaks were moderate.
Besides genomic composition, the sequence patterns
for co-localized peaks in TPRs context were also
explored, which was shown in Figure 3B. We observed
that the CA-repeat pattern was overrepresented in the
me1me2 and me1me3 types. The GT-repeat pattern, as
a complementary type of CA-repeat, was found in
me2me3 type. In previous studies, the CA-repeat
(GT-repeat) was documented to have regulatory role
Figure 5 The Average PolII tag number normalized with length
for peaks from four types of co-localization and three types of
single-localization
that CA RNA elements could function either as
splicing enhancers or silencers
. In a
recent study, intronic CA sequences were
demonstrated to aid alternative splicing
. Thus, there was a potential association of
alternative splicing and histone modifications,
especially for specific co-localization peaks. For the
H3K4me triplet type, A-rich (T-rich) was found to be
associated with tissue-specific genes. From Table 6,
the conserved TFBSs for the H3K4me triplet were
found more than other combinations. The
observation was supported that the H3K4me triplet
seemed most conserved (Table 1). Therefore, it was
convinced that the H3K4me triplet type was
associated with tissue specificity.
Table 6 Conserved TFBS Coverage rates for different types of localizations
Coverage rate
TSS-proximal peaks [-1k,2k]
Non-TSS-proximal peaks
All peaks
me1
0.23
0.13
0.14
me1me2
0.27
0.18
0.22
me1me2me3
0.51
0.29
0.37
me1me3
0.40
0.24
0.28
me2
0.23
0.18
0.21
me2me3
0.42
0.23
0.31
me3
0.43
0.27
0.36