CMB_2025v15n1

Computational Molecular Biology 2025, Vol.15, No.1, 1-12 http://bioscipublisher.com/index.php/cmb 4 Table 3 Classification of alternative splicing events in different datasets inA. niger Data sources AltA AltD IntronR ExonS Complex Total I 5679 3244 5194 1126 12322 27565 II 1188 625 814 159 952 3738 III 577 254 318 91 301 1541 IV 554 273 413 128 420 1788 V 3763 2024 2994 596 5402 14779 VI 2321 1154 1556 353 2385 7769 VII 568 229 519 139 756 2211 RNA-seq Merged 9506 5749 11706 1781 30943 59685 mRNAdata 397 213 723 74 691 2098 RNA-seq and mRNA merged 10097 6063 12469 1945 33141 63715 (%) 15.8 9.5 19.6 3.1 52.0 100.0 For identification of AS events in RNA-seq data, we first identified AS events in each project by merging mapping information of all samples within each project, then we merged all mapping information of seven projects. Finally, we merged mRNA mapping information with RNA-seq data mapping information to generate the final list of AS events (Table 3). Since each project had different numbers of samples and associated reads, thus the AS events varied greatly among them. Among the basic AS events, AltA was the predominant AS type followed by IntronR type in all the RNA-seq projects (Table 3). However, we noticed that IntronR became predominant type when all RNA mapping data were merged. Another interesting observation was when RNA-seq data were combined with mRNA data, the total numbers of AS events were more than the addition of the two datasets analyzed individually, since AS events were identified by pair-wise comparisons of isoforms generated from a gene undergoing AS (Table 3). In short, in this work we have identified a total of 63,715 AS events including 10,097 (15.8%) AltA, 6,063 (9.5%) AltD, 12,469 (19.6%) IntronR, 1,945 (3.1%) ExonS, and 33,141 (52.0%) complex events. ExonS is the least basic AS type in A. niger, suggesting the splicing mechanism in fungal species is similar with plant species. These AS events were identified from 4,972 genomic loc involving 43,156 unique transcripts. Combining all the mapping data we obtained a total of 9,939 genomic loci with a total of 66,007 transcripts assembled by Cufflinks tool (Table 4). Among these genomic loci, 7,026 (70.7%) loci produced two or more transcripts and 2,913 loci generated one transcript each locus. However, based on the loci mapping of the isoform transcripts with AS events, 4,972 loci were identified for generation of AS events. Thus, the AS rate based on current data collection at the genome level for all genes was estimated to be ~50.0% in A. niger (Supplementary Table 2). However, as there were 1,032 gens consisting of only a single exon, i. e., no intron, AS rate among intron containing genes was 55.8%. To our knowledge this is the highest AS rate ever reported in a fungal species (reviewed by Fang et al., 2020). Comparing with gene models in the reference genome annotation, 8,347 genomic loci in our work were mapped to the reference genomic loci. However, current A. niger reference genome was annotated with 10,828 genomic loci (Pel et al., 2007), interestingly, these loci were mapped to 8,347 genomic loci generated in our data. Clearly some of reference genomic loci were merged into longer, and, thus resulting in, fewer loci in our data. In addition, there were 1,592 genomic loci unmapped with annotated genomic loci, representing newly identified genomic loci with the supporting evidence from RNA-seq data. The newly identified genomic loci generated 3,388 transcripts and 2,795 ORFs were predicted from these transcripts. 3.3 Functional annotation of transcripts A total of 66,007 RNA transcript sequences were retrieved and further annotated, including ORF prediction, functional annotation based on BLASTX against UniProt-Swiss-Prot database, and protein family (Pfam) prediction. These basic features of these transcripts were summarized (Table 4). The transcripts have an average length of 3,735 bp and 22,162 (33.6%) transcripts had a BLASTX match against Swiss-Prot data. A total of 61,153 (92.6%) were predicted to have an ORF region which contained a start codon with a minimum length of

RkJQdWJsaXNoZXIy MjQ4ODYzNA==