Legume Genomics and Genetics 2026, Vol.17, No.1, 32-48 http://cropscipublisher.com/index.php/lgg 37 High-quality reads are subsequently aligned to the soybean reference genome using splice-aware aligners such as HISAT2, STAR, or similar tools, which can correctly map reads spanning exon-exon junctions (Machado et al., 2019; Kong et al., 2025). Mapping statistics-including overall alignment rate, proportion of uniquely mapped reads, distribution across exons, introns, and intergenic regions, and coverage uniformity across gene bodies-provide essential indicators of library complexity and technical artifacts (Lindlöf, 2025). For example, soybean drought-response datasets routinely report >90% mapping rates and high Q30 scores, reflecting well-constructed libraries and appropriate reference annotations (Figure 2) (Kong et al., 2025). Specialized QC suites such as RSeQC or related packages further evaluate GC bias, strand specificity, 5′-3′ coverage bias, and read distribution over genomic features, helping to flag over-amplification, degradation, or sample-preparation biases that might confound differential expression and functional analyses. These quality-assured alignments then serve as the basis for accurate quantification and downstream statistical modeling. Figure 2 RNA-seq quality control and alignment workflow in soybean research (Adopted from Kong et al., 2025) 3.3 Identification and functional annotation of differentially expressed genes Quantification of gene or transcript abundance from aligned reads is typically performed using counting tools such as featureCounts or HTSeq, or via pseudo-alignment and transcript-level methods like Salmon and Kallisto, generating a matrix of raw counts across all samples (Conesa et al., 2016; Stark et al., 2019; Chen et al., 2023). Because raw counts are influenced by library size and compositional effects, differential expression analysis relies on statistical frameworks such as DESeq2, edgeR, or related negative binomial-based models, which estimate dispersion and fit generalized linear models to identify differentially expressed genes (DEGs) between drought and control conditions (Li et al., 2022; Ludt et al., 2022). Stringent thresholds on adjusted p-values (e.g., FDR < 0.05) and fold-change (e.g., |log2FC| ≥ 1) are commonly applied to define biologically meaningful DEGs, while PCA and clustering of normalized expression values help assess sample separation by treatment and genotype (Wang et al., 2022; Kong et al., 2025). In soybean drought studies, thousands of DEGs are often detected across time points and tissues, reflecting extensive transcriptional reprogramming under water deficit. Functional interpretation of DEGs relies on integrating expression data with gene annotation resources and pathway databases. Gene Ontology (GO) enrichment analysis is routinely used to identify over-represented biological processes, molecular functions, and cellular components among up- and down-regulated genes, highlighting pathways such as hormone signaling, reactive oxygen species detoxification, osmolyte metabolism,
RkJQdWJsaXNoZXIy MjQ4ODYzNA==