Computational Molecular Biology 2017, Vol.7, No.1, 1-11
2
enzymatic, and signaling activities (Stamm et al., 2005). In plants, IR has been shown to be the most dominant
form with reports suggesting the proportions of intron containing genes undergoing AS in plants ranged from
~30% to >60% depending the depth of available transcriptome data (Reddy et al., 2013; Sablok et al., 2011). On
contrast, recent reports suggest the down-regulation of the IR events and up-regulation of the alternative
donor/acceptor site (AltDA) and ES under heat stress in model
Physcomitrella patens
(Chang et al., 2014). With
the advent of the Next Generation Sequencing (NGS) based approaches, fine scale physiological implications
revealed that AS increasing the complexity of the alternative mRNA processing which involved in the microRNA-
mediated gene regulation in Arabidopsis (Yang et al., 2012). Complex networks of regulation of gene expression
and variation in AS has played a major role in the adaptation of plants to their corresponding environment (Syed
et al., 2012) and additionally in coping with environmental stresses.
Rice (ssp
japonica
and
indica
), maize, and sorghum are important cereal crops as major sources of food in many
countries. Previously several approaches have widely demonstrated the identification of the quantitative trait loci,
genes and proteins linked to the functional grain content in these species (Mao et al., 2010). However, a major
portion of the gene functional diversity is controlled by a spliceosomal regulated AS. AS has been shown to be a
critical regulator in grass clade, demonstrating several of the genes involved in flowering and abiotic stress
depicting alternative splicing (Reddy et al., 2013; Walter et al., 2013; Staiger et al., 2013). Identifying genes with
pre-mRNAs undergoing alternative splicing in these cereal plants is critical in understanding the functions and
regulations of these genes in plant development and abiotic or biotic stress resistance. Previously, using the
homology based mapping approach and expressed sequence tags (ESTs) representing the functional transcripts,
we identified a total of 941 AS genes in
Brachypodium distachyon
, a model temperate grass (Sablok et al., 2011;
Walters et al., 2013). Previous reports on the identification and prevalence of the alternative splicing events in rice
(Campbell et al., 2006; Wang and Brendel, 2006), sorghum (Panahi et al., 2014), and maize (Thatcher et al., 2014)
have shown the functional diversity changes through EST/RNA-seq approaches. Recently we also reported our
efforts in identification of AS genes in rice (both
japonica
and
indica
), maize, and sorgum (Min et al., 2015). We
compared the AS event landscape and the AS gene functional diversity in cereal plants and also comparatively
analyzed these AS genes with AS genes identified from
B. distachyon
to reveal conserved patterns of the AS
across the grass species. In this work, we incorporated more transcripts data generated using RNA-seq
technologies and significantly expanded the list of genes with their mRNAs undergoing AS in maize.
1 Materials and Methods
1.1 Sequence datasets and sequence assembly
In order to comprehensively identify all possible AS events in maize, multiple sources of maize expressed
transcripts were integrated including expressed sequence tags (ESTs), mRNA sequences, and transcripts
assembled from RNA-seq data. The data sources consisted of a total of pre-assembled 778 172 transcripts
obtained from four sources: (1) 488 243 putative unique transcripts (PUTs) assembled with over 2 million of
expressed sequence tags and mRNA sequences which were collected from NCBI dbEST and nucleotide database
(as of Oct 2013) (Min et al., 2015); (2) 181 779 transcripts assembled from over 200 RNA-seq libraries (Thatcher
et al., 2014); (3) 48 432 novel transcript isoforms identified from 147 RNA-seq libraries generated in different
developmental stages with and without drought stresses (Thatcher et al., 2016), these sequences were extracted
using the version 2 maize genome based on the mapping information provided in the Sup. Table 1 (Thatcher et al.,
2015); and (4) recently deposited 59 263 mRNA sequences and 465 ESTs (from Oct 2013 to Dec 2015) with their
polyA/T ends trimmed using trimmest tool in the EMBOSS package (Rice et al., 2000). The combined data
consisting of a total of 767 717 transcripts were re-assembled using CAP3 with the following parameters:
-p95-o40-g3-y50-t1000 (Huang and Madan, 1999). A total of 614 201 putative unique transcripts (named as Mz#)
(PUTs) were obtained including 73 089 contigs and 541 112 singlets for downstream mapping to maize genome
sequences.
1.2 Mapping PUTs to genome and identification AS events
The maize genome assembly and gene models (B73 RefGen_v3.22) was downloaded from maizeGDB