LGG_2026v17n1

Legume Genomics and Genetics 2026, Vol.17, No.1, 49-67 http://cropscipublisher.com/index.php/lgg 58 prediction analyses, demonstrating that SNP-based models such as rrBLUP and other machine-learning approaches can achieve moderate to high predictive accuracies for yield, maturity, plant height, seed weight, disease resistance, and stress tolerance, though performance is trait-and environment-dependent (Ravelombola et al., 2021). The SNPs, haplotypes, and candidate genes revealed by these GWAS now underpin marker-assisted selection, development of diagnostic assays, and genomic selection pipelines, directly linking genomic diversity and population structure to practical improvement of soybean agronomic performance. 6 Case Study: Population Structure Analysis of Global Soybean Core Germplasm 6.1 Principles for constructing core germplasm collections Core germplasm collections are designed to capture the maximum genetic and phenotypic diversity of an entire genebank in a substantially reduced number of accessions, thereby making evaluation and utilization more efficient. Conceptually, a soybean core collection usually represents about 2%–5% of the entire collection, with mini-core sets often reduced to ~1% while still maintaining major patterns of geographic and agro-morphological variation. Construction typically begins with stratification of the base collection by ecoregion, maturity group, improvement status (wild, landrace, cultivar), and key phenotypes such as seed size or growth habit, followed by sampling within strata using proportional, logarithmic, or multivariate strategies (Oliveira et al., 2010). In soybean, core development has progressively integrated molecular marker data—initially SSR and later SNP genotypes—together with multi-environment phenotypic data to ensure that rare alleles and extreme trait values are retained at acceptable frequencies (Lijuan et al., 2009). Several methodological frameworks have been proposed to operationalize these principles in soybean. The Chinese national platform developed a three-tier system of core, mini-core, and integrated applied core collections, combining SSR-based diversity with extensive phenotyping to select accessions that represent broad gene pools as well as trait-specific subsets for stress tolerance and quality (Guo et al., 2014). The USDA Soybean Germplasm Collection applied stratified and multivariate sampling of passport, phenotypic, and later SNP data to assemble cores that maximize variability while preserving quantitative trait distributions of the entire collection (Satyawan and Tasma, 2021). More recently, algorithmic approaches such as Core Hunter have been applied directly to high-density SNP datasets from >20,000 accessions to identify a few hundred entries with maximal pairwise genetic distance, while maintaining original allele frequencies and phenotypic variation for key traits like yield and branching (Song et al., 2015). Collectively, these experiences establish that effective soybean core sets must balance representativeness, low redundancy, manageability, and explicit conservation of molecular diversity. 6.2 Elucidation of population structure based on SNP markers High-density SNP genotyping has transformed population-structure analysis in soybean core and global collections by enabling genome-wide characterization of relationships among thousands of accessions. Using tens of thousands of SNPs from whole-genome resequencing or SoySNP50K-type arrays, Bayesian clustering, ADMIXTURE/STRUCTURE analysis, and principal component analysis (PCA) routinely distinguish wild (Glycine soja) from cultivated (G. max) groups and reveal transitional accessions with mixed ancestry (Zatybekov et al., 2025). Within cultivated soybean, SNP-based analyses consistently identify geographically and ecologically coherent subpopulations, such as distinct clusters for Chinese, Japanese, Korean, and American germplasm, as well as separations among tropical, temperate, and high-latitude maturity groups (Tsindi et al., 2023). For example, a 14,000-accession SNP survey of the USDA collection resolved five major ancestral clusters and demonstrated that most North American cultivars trace their ancestry to a limited subset of Chinese landrace gene pools (Bandillo et al., 2015). Case studies using SNP-genotyped regional cores highlight both the power and limitations of global germplasm. In a WGRS-based analysis of 694 accessions including Kazakh, European, North American, and wild soybeans, PCA and phylogenetic trees showed Kazakh cultivars clustering closely with European and North American material, while maintaining clear separation from G. soja; however, Kazakh accessions exhibited the lowest within-group diversity, underscoring a narrow genetic base for this emerging production region (Zatybekov et al.,

RkJQdWJsaXNoZXIy MjQ4ODYzNA==