Computational Molecular Biology 2025, Vol.15, No.1, 1-12 http://bioscipublisher.com/index.php/cmb 2 type of AS events. Clearly, there is a lack of systematic genome-wide identification of alternatively spliced genes in this important species. In considering the importance of the organism in industrial applications, we carry out a systematic genome-wide identification and analysis of AS events in A. niger by integrating available RNA transcripts with RNA-seq data from multiple published projects. The aim is to generate a catalog of genes subjecting to AS in A. niger. Such a collection of these genes with their respective transcript isoform annotation information may serve as a foundation for further characterizing the biological functions and regulations of these genes in this important fungal species for the fermentation industry. 2 Materials and Methods 2.1 Genome, mRNA sequences, and RNA-seq datasets A. niger reference genome sequences with annotation GFF (Gene feature format) file (CBS 513.88, assembly ASM285v2) and other related files were downloaded from the genome database of the National Center for Biotechnology Infomation (NCBI, https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000002855.4/) (Pel et al., 2007). A total of 78,361 A. niger mRNA sequences which includes 46,938 ESTs were downloaded from NCBI nucleotide database. The RNA-seq data was down-loaded from the NCBI SRA database (https://www.ncbi.nlm.nih.gov/sra/docs/sradownload/) using SRA Toolkit. The RNA-seq datasets were selected from seven projects which were recently published in eight RNA-seq publications (Table 1). A total of 303 RNA-seq samples generated from diverse treatments were collected (Table 1). The data from the project PRJNA250529 were generated and analyzed by Daly et al. (2017) and van Munster et al. (2020). Daly et al. (2017) investigated the responses of A. niger to ionic liquid (IL) or hydrothermally (HT) pretreated knife-milled wheat straw (KMS) over a time course using RNA-seq and proteomics. van Munster et al. (2020) further analyzed the responses of A. niger to the feedstock Miscanthus and compared the results on wheat straw. Other data were collected including A. niger cultured in peanut or cashew nut flour- based media (Mattison et al., 2021), in steam-exploded sugarcane bagasse (Borin et al., 2017), in sugar beet pulp (Garrigues et al., 2022), in sucrose or inulin (Kun et al., 2023), in glucose or wheat straw (Xu et al., 2024), and wildtype and different deletion strains cultured in glucose (van Leeuwe et al., 2020). We also tested RNA-seq datasets reported in project PRJNA316878 and PRJNA148183 and found the RNA-seq data mapping rate <50%, those data were not included for further analysis. Table 1 RNA-seq data sources and related reference RNA Projects SRA data Treatments References I PRJNA250529 137 Responses to wheat straw or Miscanthus Daly et al. (2017); van Munster et al. (2020) II PRJNA553205 12 Responses to peanut or cashew nut flour Mattison et al. (2021) III PRJNA636647 4 Comparing of a wildtype with a deletion strain van Leeuwe et al. (2020) IV PRJNA350271 8 Responses to sugarcane bagasse Borin et al. (2015) V Multiple projects* 90 Sugar beet pulp utilization Garrigues et al. (2022) VI Multiple projects* 46 Sucrose and inulin utilization Kun et al. (2023) VII PRJNA1067358 6 Responses to glucose or wheat straw Xu et al. (2024) Total samples 303 - - Note: * The accession numbers of the data can be found in related references and in supplementary files 2.2 mRNA sequence mapping, RNA-seq reads mapping, and AS identification The procedure for mRNA sequences cleaning and further assembling into a non-redundant set of unique transcripts were described in our previous work (Clark et al., 2019). The final cleaned transcripts consisting of 78,194 sequences were further assembled into a non-redundant set of 23,853 sequences. The assembled nucleotide sequences were mapped to A. niger genome sequences using cutoff values of a minimum 95% identity and >75% length coverage using ASFinder and Sim4 programs (Florea et al., 1998; Min, 2013). The RNA-seq reads were mapped to the reference genome sequences using TopHat (v2.2.6) with default parameters (Kim et al., 2013). TopHat2 is designed to handle a relatively low error rate, typically considered
RkJQdWJsaXNoZXIy MjQ4ODYzNA==