Computational Molecular Biology, 2018, Vol.8, No.1, 1-13
6
genes encoded proteins had Pfam matches to a total of 2,454 protein families. The top protein families in the
whole cotton proteome and proteins encoded by genes undergoing AS were listed in Table 3. Among the protein
families, many of them were found having AS isoforms in other plant species including cereal plants and fruit
plants (Min et al., 2015; Wai et al., 2016; Sablok et al., 2017). These protein families include Pkinase (protein
kinase domain), RRM_1 (RNA recoginition motif), Pkinase_Tyr (protein tyrosine kinase), P450 (cytochrome
P450), Ras family, UQ_con (uniquitin-conjugating enzyme), etc., suggesting an evolutionally systematic
conservation of AS in plant species (Min et al., 2015; Sablok, 2017). We noticed that among 100 genomic loci
encoded cellulose synthase (Pfam03552) 39 of them had alternative splicing. In considering the important role
played by this enzyme in fiber formation, the functional significance of AS of these genes is warranted for further
examination.
Genes undergoing AS during post-transcriptional process produce functional isoforms or non-functional isoforms.
We evaluated the impact of AS on the functionalities of the gene products by comparing their Pfam annotation.
Among a total of 50,680 isoform pairs generating AS events, 14,214 (25.3%) pairs had no Pfam hit, 30,202
(53.9%) isoform pairs had identical Pfam, 9,046 (16.1%) pairs had one isoform with a Pfam hit and the other not
having a Pfam hit, indicating the functional loss of gene products, and 2,708 (4.8%) pairs had different Pfam hits
(Supplementary Table 2). Thus, about 20.9% of AS events generated isoforms with functional loss or change.
Similar results were obtained in our previous analysis with pineapple and maize data (Wai et al., 2016; Min et al.,
2017). The Pfam loss or change in the gene products is most likely caused by the translation frame changes in AS
isoforms. The MADS-box genes were alternatively spliced in cotton and some of the alternatively spliced
isoforms potentially encoded proteins with altered K-domain and/or C-terminal regions (Lightfoot et al., 2008).
The genes were expressed in developing fiber cells suggesting a role in cotton fiber biosynthesis. The biological
significance of the change in protein family functional domains in these genes certainly is interesting for further
investigation.
2.5 Gene Ontology (GO) analysis of gene products
GO categories provide an overview of the gene products involved in the biological processes, molecular functions,
and cellular components. As GO annotation is fairly complex with variable available information for different
gene products, thus the analysis is not intended for an accurate quantification but rather providing a broad picture
of the functionalities of the gene products. Among the whole set of 27,9031 cotton PUTs sequences a total of
201,924 (72.3%) PUTs had a BLASTX hit (E-value < 1e-5) against the Swiss-Prot database. Then using the
Swiss-Prot identifiers we retrieved a total of 1,324,154 GO identifiers. These GO identifiers were further grouped
into top categories using GO Slim Viewer (McCarthy et al., 2006). The isoforms from AS genes were also
analyzed using the same procedure and a total of 234,362 GO identifiers were obtained. Our previous analysis
showed that GO cellular component analysis based on BLASTX method was not accurate, thus we only
summarized the GO classification of biological process and molecular function in the whole set of PUTs and
isoforms generated by AS genes (Table 4; Table 5). The top categories of molecular functions include binding,
catalytic activity, nucleotide binding, transferase activity, hydrolase activity, protein binding etc. (Table 4). These
top categories of molecular functions of gene products are more or less similar in all plant species we have
examined (Min et al., 2015). The top categories of biological processes include cellular process, metabolic process,
biosynthetic process, nucleobase-containing compound metabolic process, response to stress, etc. (Table 5). As
expected, the distribution patterns of these processes were also similar in all the plant datasets we analyzed (Min
et al., 2015).
GO analysis showed AS gene products were involved in all the biological processes with various molecular
functions. In average 46.3% in GO molecular functions and 47.1% in GO biological processes were obtained from
the gene products of the AS genes. There are well characterized genes undergoing AS with demonstrated
functional significance in regulation of plant growth, development, as well as stress responses (Reddy et al., 2013;
Staiger and Brown, 2013). Therefore, the biological roles of AS genes in cotton growth and development need to
be examined further.