LGG_2026v17n1

Legume Genomics and Genetics 2026, Vol.17, No.1, 32-48 http://cropscipublisher.com/index.php/lgg 36 sugar, soluble protein, proline, and antioxidant enzyme activities, while reducing ROS accumulation and MDA levels (Cui et al., 2024). Similarly, genome-wide and expression analyses of soybean ERF TFs have identified GmERF205 as a positive regulator of drought tolerance, whose overexpression improves growth, root architecture, and yield under water deficit in both controlled and field conditions (Abdullah et al., 2025). At the network level, modules like GmPRR3b-GmABF3 integrate circadian, ABA, and drought signals, where GmPRR3b negatively regulates drought responses by suppressing GmABF3, and GmABF3 overexpression restores tolerance (Li et al., 2024). Beyond transcription initiation, post-transcriptional mechanisms-including microRNAs, mRNA decay pathways mediated by subclass I SnRK2s, and alternative splicing-as well as chromatin-level regulation and cis-element architecture in DEG promoters, further fine-tune drought-responsive gene expression. Together, these regulatory layers define a dynamic and interconnected gene network that enables soybean to sense drought, reconfigure its transcriptome, and deploy coordinated physiological and metabolic defenses. 3 Soybean Transcriptome Sequencing Technologies and Data Analysis Methods 3.1 Principles and experimental workflow of RNA-Seq Technology RNA sequencing (RNA-seq) is a next-generation sequencing (NGS)-based technology that profiles the complete set of RNA molecules in a cell or tissue, providing quantitative and qualitative information on gene expression, alternative splicing, and novel transcripts. In contrast to microarrays, RNA-seq does not rely on pre-designed probes and therefore offers a broader dynamic range and the ability to detect previously unannotated genes and isoforms (Severin et al., 2010). In soybean and other plants, most drought-stress studies currently use short-read Illumina platforms, which generate millions of 50-150 bp reads that can be accurately aligned to the reference genome or transcriptome to estimate expression levels for each gene or transcript (Min et al., 2021). The core principle is straightforward: RNA molecules present in the sample at the time of extraction are converted into complementary DNA (cDNA), fragmented, and sequenced, and the number of reads mapping to each feature is taken as a proxy for its abundance, subject to normalization for gene length and sequencing depth (Stark et al., 2019). A typical RNA-seq workflow for soybean drought studies begins with careful experimental design, including definition of stress treatments, time points, tissues, and biological replication, followed by high-quality RNA extraction from control and stressed samples (Machado et al., 2019; Min et al., 2021). After assessing RNA integrity (e.g., using Bioanalyzer RIN values), mRNA is usually enriched by poly(A) selection, though rRNA depletion or total RNA protocols are alternatives for non-polyadenylated transcripts (Conesa et al., 2016; Stark et al., 2019). The mRNA is then fragmented, reverse-transcribed into cDNA, ligated to platform-specific adapters, and amplified to construct sequencing libraries, which are subjected to high-throughput sequencing on an Illumina instrument (Chen et al., 2016). Emerging long-read platforms such as PacBio and Oxford Nanopore are increasingly used to capture full-length transcripts and complex isoforms, complementing short-read RNA-seq by improving transcriptome annotation and isoform resolution in soybean (Stark et al., 2019; Tyagi et al., 2022; Monzó et al., 2025). Together, these technologies underpin a diverse range of soybean transcriptomic applications, from drought response profiling to expression atlas construction (Severin et al., 2010). 3.2 Transcriptome data quality control and sequence alignment Robust downstream inference from RNA-seq requires stringent quality control (QC) at both the raw read and alignment levels. Initial QC typically uses tools such as FastQC and MultiQC to assess per-base sequence quality scores, GC content, adapter contamination, and sequence duplication rates, identifying potential technical problems introduced during library preparation or sequencing (Chen et al., 2023; Lindlöf, 2025). When necessary, adapters and low-quality bases are trimmed, and overly short reads are removed to minimize mapping errors. Post-trimming metrics such as the proportion of reads retained, updated quality score distributions, and base composition profiles are then re-evaluated to ensure that preprocessing has improved, rather than degraded, data quality (Li et al., 2014). In soybean drought studies, additional QC often includes checking for rRNA contamination, evaluating sequencing depth sufficiency for differential expression analysis, and confirming sample clustering via principal component analysis (PCA) to detect outliers or mislabeled samples (Kong et al., 2025).

RkJQdWJsaXNoZXIy MjQ4ODYzNA==