Computational Molecular Biology 2017, Vol.7, No.1, 1-11
5
Table 2 Alternative splicing events in maize
Previous (%)*
Current (%)
exon skipping
1 568 (5.7)
14 531 (7.5)
alternative donor sites
2 080 (7.6)
17 871 (9.3)
alternative acceptor sites
3 314 (11.4)
24 748 (12.8)
intron retention
11 048 (40.4)
46 416 (24.1)
others (complex events)
5 576 (20.4)
89 058 (46.2)
Total
23 386
192 464
Note: *Previous data from Min et al. (2015).
The percentage of AS genes was estimated based on the proportion of predicted gene models having AS PUT
isoforms. As a total of 37 751 gene models have at least on PUT being mapped and among them 2 860 had AS,
thus, the rate of AS genes was estimated to be 55.3% in maize. Compared with our previous analysis (Min et al.,
2015), the number of genes which were transcribed with alternatively spliced transcripts (AS genes) identified in
this study was significantly increased from 10,687 to 20 860, the rate of AS genes was increased from 33.8% to
55.3%. Also the number of AS genes identified in the study is higher than the number reported by Thatcher et al.
(2014), which was 15 771, using RNA-seq data. A recent report using the RNA-seq technology revealed that
~61% of multi-exonic genes in
A. thaliana
are alternatively spliced under normal growth conditions (Marquez et
al., 2012). The maize AS rate (55.3%) mentioned above was based on all maize gene models with transcript
mapping evidence. If we only count gene models having PUTs mapping at least with two exons, there were a total
of 31 049 such gene models, thus, the AS rate was 67.2% in maize. We would like to point out that the numbe r of
AS events and isoforms in our analysis were higher than the numbers obtained by Mei et al. (2017) as different
datasets and assembling approaches were used. However, the AS rate was also reported to be near 60% of
expressed multi-exon genes in B73 (Mei et al., 2017).
Recently Yan et al. (2014) developed a database of intron-less genes of Poaceae (PIGD,
,
which collected 14 623 intron-less maize genes. We compared the list of maize intron-less genes with our
mapping data and found 7 152 of them actually had an intron or introns that were directly supported by PUTs
mapping (Supplementary Table 1 – file: false_intronless.ids). Thus, the intron-less gene lists collected by Yan et al.
(2014) need to be examined thoroughly with gene expression data for other types of analysis. The transcripts
mapping to genome information generated in the work can be further used to improve the predicted gene
structures in maize.
2.3 Functional classification of AS genes
For simplicity of description below, gene models which have pre-mRNAs generating AS transcript isoforms are
referred as AS genes, and gene models having pre-mRNAs with no AS transcripts identified in the current analysis
are referred as non-AS genes. To obtain a general picture of AS genes and non-AS genes, Gene Ontology (GO)
analyses was performed using the protein sequences of the gene models which had at least one PUT mapped, i. e.,
they were transcribed and may represent real genes. The predicted protein sequences were used. Thus a total of
37 751 protein sequences were subjected to GO analysis.
Within 37 751 protein sequences 24 061 had GO mapping, and among 20 860 protein sequences of AS genes
15 344 had GO mapping. These mapped GO IDs were further clustered used GOSlimViewer server
. Based on our experiences in analyzing
cellular components and protein subcellular location (Lum et al., 2014), GO cellular component analysis based on
BLASTP method is not accurate, thus it was not included. We compared the GO classification of biological
process and molecular function in AS gene set with the whole set of expressed genes supported with transcript
evidence (Table 3; Table 4). AS gene products were involved in all the biological processes with various
molecular functions. In average 78.6% and 78.9% of expressed genes had AS with protein products involved in
known GO biological processes and molecular functions, respectively (Table 3; Table 4). As the data were
collected from pooled ESTs, mRNAs, as well as assembled transcripts from RNA-seq data, it is difficult to make