CMB-2018v8n1 - page 5

Computational Molecular Biology, 2018, Vol.8, No.1, 1-13
2
non-functional due to harboring a premature termination codon in protein coding regions. The nonfunctional
isoforms are degraded by a process known as nonsense-mediated decay (NMD) (Lewis et al., 2003).
Arabidopsis thaliana
, a model plant species, has been intensively investigated and were reported with ~60-70% of
multi-exon genes undergoing AS (Filichkin et al., 2010; Zhang et al., 2010; Marquez et al., 2012; Syed et al.,
2012; Carvalho et al., 2013; Yu et al., 2016; Zhang et al., 2017). AS in other plant species also has been examined
including
Oryza sativa
(rice) (Wang and Brendel, 2006; Min et al., 2015; Wei et al., 2017; Kater et al., 2018),
Nelumbo nucifera
(sacred lotus) (VanBuren et al., 2013),
Vitis vinifera
(grape) (Vitulo et al., 2014; Sablok et al.,
2017),
Brachypodium distachyon
(Sablok et al., 2011; Walters et al., 2013),
Zea mays
(maize) (Thatcher et al.,
2014; Min et al., 2015; Thatcher et al., 2016; Mei et al., 2017; Min, 2017), and
Sorghum bicolor
(sorghum)
(Panahi et al., 2014; Min et al., 2015; Abdel-Ghany et al., 2016), etc. Approximately 60-75% of AS events occur
within the protein coding regions of mRNAs, resulting changes in binding properties, intracellular localization,
protein stability, enzymatic, and signaling activities (Stamm et al., 2005). IR has been shown to be the most
frequent AS event in plants with AS rates in the intron containing genes ranged from ~30% to > 60% depending
on available transcriptome data (Sablok et al., 2011; Reddy et al., 2013; Sablok et al., 2017). Genome-wide
conserved alternatively spliced genes among different plant species have been identified in cereal plants and fruit
plants (Min et al., 2015; Sablok et al., 2017). Further, genome-wide conserved AS events across a wide range of
plant species such as in flowering plant species as well as in monocot species have also been analyzed (Chamala
et al., 2015; Mei et al., 2017). These works lay the foundation for identifying and studying conserved AS genes as
well as conserved AS events across evolutionally related plant species (Min et al., 2015; Mei et al., 2017).
There were only three reports related to genome-wide AS analysis in cotton so far. Using RNA-sequencing
(RNA-seq) data from
G. raimondii
, 16,437 AS events in 10,197 genes were identified (Li et al., 2014b). Similar
RNA-seq analysis identified 14,172 AS events in 6,797 genes
G. davidsonii
growing under salt stress conditions
(Zhu et al., 2018). Most recently, Wang et al. (2018) reported that using Pacific Biosciences single molecule
long-read isoform sequencing (Iso-Seq) identified 176,849 full-length transcript isoforms, detected a total of
133,229 AS events, from 27,229 gene loci, with 15,102 fiber-specific AS events in
G. barbadense
, an
allotetraploid cotton species. In all three reports, the prevalent type of AS events was retained introns. In this work,
we report a survey of AS events using currently available expressed sequence tags (ESTs) and mRNA sequences
with an aim to generate a preliminary catalog of alternatively spliced genes in the cultivated upland cotton species,
G. hirsutum
.
1 Materials and Methods
1.1 Sequence datasets and sequence assembly
Two draft genome sequences of allotetraploid cotton (
G. hirsutum
L. acc. TM-1) have been generated
independently (Li et al., 2015; Zhang et al., 2015). In this work we used the genome sequences (assembly
ASM98774v1) generated by Li et al. (2015) as they were available for downloading from the National Center for
Biotechnology Information (NCBI) genome database (
). We
also downloaded a total of 432,161 nucleotide sequences of
G. hirsutum
including 94,350 mRNA sequences and
337,811 EST sequences. For simplicity of description the term “cotton” only means
G. hirsutum
in the context,
otherwise, full species names were specified.
1.2 Transcripts assembly, mapping to genome, and identification AS events
The EST and mRNA sequences were processed to remove contaminants, vector and repetitive sequences using a
procedure we implemented previously (Min et al., 2015). The procedure was briefly outlined below: EMBOSS
trimmest tool was used to trim the polyA or polyT end (Rice et al., 2000); then trimmed ESTs and mRNA
sequences were used to search against UniVec and
E. coli
database using BLASTN for removal of vector and
E.
coli
contaminants; finally BLASTN searches against the plant repeat database which was built with TIGR
gramineae repeat data, sorghum, maize, and rice repeat data (available from ftp://ftp.plantbiology.msu.edu/pub/
data/TIGR_Plant_Repeats/). A total of 430,541 cleaned EST and mRNA sequences were assembled using CAP3
with the following parameters: -p 95 -o 50 -y 20 (Huang and Madan, 1999). A total of 279,050 putative unique
1,2,3,4 6,7,8,9,10,11,12,13,14,15,...18
Powered by FlippingBook