Computational Molecular Biology 2017, Vol.7, No.1, 1-11
3
. The assembled PUTs were mapped to their corresponding chromosomes
using ASFinder (
) (Min, 2013). ASFinder uses SIM4 program
(Florea et al., 1998) to align PUTs to the genome, and then subsequently identifies those PUTs that are mapped to
the same genomic location but have variable exon-intron boundaries as AS isoforms. To avoid the spurious
mapping, we applied a threshold of minimum of 95% identity for all aligned PUT with a genomic segment (exon),
a minimum of 80 bp aligned length, and >75% of a PUT sequence aligned to the genome (Walters et al., 2013).
To avoid chimeric assemblies, mapped PUTs having an intron size >100 kb were removed for AS identification.
The output file (AS.gtf) of ASFinder was then subsequently submitted to AStalavista server
(
/) for AS event classification (Foissac and Sammeth, 2007). The percentage of
alternative splicing genes was estimated using the genome predicted gene models having alternative splicing PUT
isoforms among total gene models having at least one mapped PUT. There are a total of 63 241 cDNA sequences
generated from 39 475 genes in the recent release of maize gene models (version 3.22,
. Among them 12 627 genes have two
or more cDNA sequences, i. e., with isoforms generated by pre-mRNAAS.
1.3 Functional annotation of PUTs
The coding region of each PUT was predicted using the ORFPredictor (Min et al., 2005a) and the full–length
transcript coverage was assessed using TargetIdentifier (Min et al., 2005b) as previously described. Functional
classification was assigned to the PUTs by performing BLASTX search with an E-value threshold of 1e-5 against
UniProtKB/Swiss-Prot. Additionally, predicted protein sequences from ORFPredictor were further annotated
using rpsBLAST against the Pfam database (
/). To assess the coverage of the assembled PUTs,
we further compared PUTs against the predicted gene primary transcripts using BLASTN with a cut off E-value of
1e-10, ≥95% identity and minimum aligned length of 80 bp, the results were summarized in Table 1. Gene
Ontology (GO) information was extracted from the UniProt ID mapping table based on the BLASTP of gene
model
protein
sequences
against
the
UniProtKB/Swiss-Prot
(ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/).The GO categories were further analyzed using GO
SlimViewer using plant specific GO terms (McCarthy et al., 2006).
Table 1 Basic features of the assembled putative unique transcripts (PUTs) of maize plant
Total PUTs
614 201
Average length of PUTs (bp)
815
BLASTX matche against UniProt/Swiss-Prot database
247 798
Total ORFs
601 196
Full-length PUTs
128 505
Pfam matches
166 174
PUTs mapped to genome (%)
320 447 (52.2)
PUTs matched to cDNAs of gene models (%)
298 248 (48.6)
PUTs mapped to genome with gene models (%)
206 593 (33.6)
Unique genes supported with matching PUTs (%)
37 751 (95.6)
AS rate of gene models (%)
20 860 (55.3)
1.4 Conserved alternatively spliced genes in cereal plants and visualization of AS
In our previous report, we have identified conserved AS genes among rice, maize, sorghum and
Brachypodium
(Min et al., 2015). In the current work, only maize and rice (ssp
japonica
) conserved AS genes were identified.
The reciprocal BLASTP (cutoff E-value 1E-10) was done using the longest ORF of the rice AS isoforms with
maize predicted gene model protein sequences for classifying the conserved AS pairs between the species. AS
events identified in this study along with the integrated genomic tracks of predicted gene models, as well as data
reported previously, are available from Plant Alternative Splicing Database (
/)
(Walters et al., 2013; VanBuren et al., 2013; Min et al., 2015). BLAST search is also available for searching the