Legume Genomics and Genetics 2026, Vol.17, No.1, 49-67 http://cropscipublisher.com/index.php/lgg 60 selection pipelines, where genomic estimated breeding values can be trained on core panels and applied to larger breeding populations. As genebank-scale SNP datasets become ubiquitous, integrating well-designed core collections with high-throughput phenotyping and omics will be critical for converting conserved diversity into elite, climate-resilient soybean cultivars. 7 Prospects for the Application of SNP Markers in Soybean Genetic Improvement 7.1 Marker-assisted selection (MAS) breeding SNP markers are now central to marker-assisted selection in soybean because they are abundant, codominant, and amenable to high-throughput, low-cost genotyping. A series of SNP arrays and targeted panels (e.g., BARCSoySNP6K, SoySNP3K, SoySNP1K, and GBTS-based 10K–40K panels) have been specifically optimized for breeding, providing genome-wide coverage with 1,000–6,000 informative markers that are sufficient for most MAS and genomic applications while avoiding redundant information and unnecessary costs (Niu et al., 2024). These panels show high genotyping accuracy, with >98% concordance to resequencing data and high minor allele frequencies in elite germplasm, ensuring that selected markers are polymorphic and reliable across breeding populations (Yang et al., 2023). Such platforms support routine tasks in breeding pipelines, including germplasm fingerprinting, pedigree verification, rapid backcross recovery, and early-generation selection of lines carrying desirable alleles for key loci. Trait-linked SNPs and KASP/TaqMan assays derived from GWAS, QTL mapping, and candidate gene studies are increasingly deployed to target specific agronomic and quality traits. For example, tightly linked SNP assays to the salt-tolerance gene GmCHX1 accurately distinguish tolerant and sensitive genotypes in diverse panels and biparental populations (>91%–98% classification accuracy), greatly facilitating the introgression of salinity tolerance into elite backgrounds (Patil et al., 2016). Similarly, a diagnostic TaqMan SNP test for pod-shattering resistance (KSS-SNP5) achieved 92%–96% prediction accuracy across F2:3 and advanced breeding lines, demonstrating that single-locus MAS can efficiently enrich resistant genotypes and reduce costly phenotyping for difficult traits (Kim et al., 2020). GWAS-identified SNPs and haplotypes controlling yield components, seed protein, sucrose, and other seed composition traits are also being integrated into MAS schemes, where pyramiding favorable alleles at multiple minor-effect loci can substantially increase phenotypic variation explained for target traits (Ravelombola et al., 2021; Qin et al., 2022; Ri̇az et al., 2023). As more trait-diagnostic SNPs are validated across environments and genetic backgrounds, MAS based on robust SNP assays will remain a practical, cost-effective strategy, especially for major genes and moderate-effect loci with clear, stable effects. 7.2 Genomic selection (GS) breeding Beyond locus-specific MAS, genome-wide SNP datasets enable genomic selection, which estimates genomic breeding values from all markers simultaneously and is particularly powerful for complex, polygenic traits such as yield and seed composition. Multiple studies in soybean demonstrate that GS can achieve moderate to high predictive accuracies for protein, oil, seed weight, maturity, and related traits using ridge regression BLUP, GBLUP, Bayesian models, and selected machine-learning approaches (Jiahao et al., 2025). For instance, GS accuracies of ~0.75–0.87 for seed weight and ~0.81 and 0.71 for protein and oil have been reported in elite breeding populations, clearly outperforming traditional MAS for these quantitative traits (Ravelombola et al., 2021; Qin et al., 2022). Even for grain yield, where prediction remains more challenging, GS routinely achieves useful accuracies (0.26–0.4) that can accelerate selection cycles when combined with optimized training sets and appropriate statistical models (Ćeran et al., 2024). Efficient GS pipelines depend critically on SNP density, marker quality, training population composition, and the selection of informative marker subsets. Empirical evaluations in soybean suggest that approximately 1,000–2,000 genome-wide, well-distributed SNPs are sufficient to reach a plateau in prediction accuracy; further increases in marker number add cost but little additional information (Qin et al., 2022; Song et al., 2024). Marker sets derived from GWAS—i.e., SNPs significantly associated with the target trait—can further improve prediction efficiency and allow high accuracies at relatively low marker densities (~5K), especially when combined with Bayesian models (Ri̇az et al., 2023). Selective genotyping and phenotyping schemes that maintain the genetic diversity of
RkJQdWJsaXNoZXIy MjQ4ODYzNA==