Computational Molecular Biology 2025, Vol.15, No.1, 1-12 http://bioscipublisher.com/index.php/cmb 5 20 amino acids encoded. The average length of predicted proteins was 285 amino acids. The current reference protein dataset in NCBI consists of 14,086 sequences with an average length 440 amino acids. In addition, using BLASTN no-gap search with a cut-off of ≥97% identity and a minimum length of align 60 bp 27,756 (41.8%) transcripts were matched with transcripts of gene models in A. niger. Table 4 Basic features of assembled RNA transcripts and functional annotation inA. niger Total genomic loci 9939 Loci having one transcript Loci having more than one transcript Mapped to gene model loci New genomic loci 2913 (29.3%) 7026 (70.7%) 8347 1592 Total unique transcripts 66007 Average transcript length (bp) 3735 Transcripts match to gene model transcripts 27566 (41.8 %) BLASTX match against Swiss-Prot dataset 22162 (33.6%) Total predicted ORFs 61153 (92.6%) Average ORF length (amino acids) 285 Total ORFs with a Pfam match 19177 (31.4%) The predicted proteins from retrieved mRNA transcripts were annotated to Pfam, that facilitates examination if the functional domain in proteins encoded by different transcript isoforms is maintained. Since the isoforms in alternatively spliced genes may encode a truncated protein, thus resulting in a domain loss, due to a pre-mature stop codon or may not be able to translate to a protein due to a translation frame shift. A total of 19,177 predicted ORFs had a Pfam match (Table 4). Among 9,939 total genomic loci identified in this work, 2,941 loci encoded proteins had Pfam matched, and among them 2 000 genomic loci were alternatively spliced. We compared the Pfam of protein sequences encoded by the isoforms in these genomic loci subject to AS with at least one isoform having Pfam. Within these loci there were 13 914 transcripts encoded ORFs had Pfam match, however, among them 4,867 transcripts encoded ORFs with different Pfam in the same loci, and 4,485 transcripts encoded ORFs lost the Pfam, i. e. a functional domain (Supplementary Table 3). However, the impacts of AS events on the functionalities of different protein isoforms need to be validated experimentally. To compare the AS rates of genes encoding different Pfam we extracted only one Pfam annotation for genes having multiple isoforms. Among a total of 9 939 genes (genomic loci), 2941 of them encoded at least one ORF matching to protein families. Among them 2 000 (68.0%) genes were alternatively spliced, though different gene families had variable AS rates (Table 5; Supplementary Table 4). The observed much higher AS rates in these protein coding genes having Pfam matches, particularly carbohydrate-active enzymes (CAZymes), indicate AS playing important roles in regulation of various types of cellular processes. For example, based on the CAZy classification we identified 84 genes encoding different families of CAZymes and found 52 of them were alternatively spliced (Table 6) (http://www.cazy.org/Home.html) (Drula et al., 2022). The functions of isoforms need to be further investigated in regards of carbohydrate metabolism in the fermentation process such as for biofuel production (Borin et al, 2017; Daly et al. 2017). 3.4 Dynamic changes of AS events in response to different growth conditions Gene expression is dynamically regulated by the compositional changes in the growth media. Daly et al. (2017), Borin et al. (2017) and van Munster et al. (2020) reported the gene expression changes of CAZymes, sugar transporters, and transcription factors and other proteins related to lignocellulose degradation in response to different growth substrates including wheat straw, feedstock Miscanthus, and sugarcane bagasse, respectively. Here we use data collected by Daly et al. (2017) and van Munster et al. (2020) to demonstrate the dynamic changes of AS events in gene expression. The treatments of growth substrates included glucose-rich conditions (GLU, control), hydrothermal pretreated Miscanthus (HTM), hydrothermal pretreated wheat straw (HTS), ionic liquid pretreated Miscanthus (ILM), ionic liquid pretreated straw (ILS), knife-milled Miscanthus (KMM), and
RkJQdWJsaXNoZXIy MjQ4ODYzNA==