LGG_2026v17n1

Legume Genomics and Genetics 2026, Vol.17, No.1, 49-67 http://cropscipublisher.com/index.php/lgg 55 4 Population Structure Analysis of Global Soybean Germplasm Resources 4.1 Methods for analyzing population structure Population structure analysis of soybean germplasm relies on multilocus genotyping combined with multivariate and model-based statistical approaches. High-throughput SNP platforms (arrays, DArTseq, GBS, and whole-genome resequencing) now provide thousands to tens of thousands of markers distributed across all 20 chromosomes, enabling robust inference of subpopulations and admixture in large panels of cultivated and wild accessions (Valliyodan et al., 2021; Zatybekov et al., 2025). Common analytical workflows begin with estimation of basic diversity indices (allele frequencies, expected heterozygosity, polymorphism information content), followed by clustering using principal component analysis (PCA), principal coordinate analysis (PCoA), and distance-based phylogenetic trees such as UPGMA or neighbor-joining (Andrijanić et al., 2023). These multivariate methods offer an initial visualization of genetic relationships and can reveal major divisions between wild and cultivated gene pools, between regions, or among breeding groups. Model-based Bayesian clustering, implemented in programs such as STRUCTURE and related admixture models, is then used to infer the most likely number of genetic clusters (K), assign membership coefficients to each accession, and quantify admixture proportions (Chander et al., 2021). Complementary statistics deepen insight into the organization of diversity. Analysis of molecular variance (AMOVA) partitions total genetic variation within and among predefined groups (e.g., regions, maturity groups, breeding programs), clarifying whether diversity is primarily within or between populations (Da Silva et al., 2025;). Fixation indices (F_ST) quantify genetic differentiation between pairs of populations and are widely used to classify divergence as negligible, moderate, or strong, guiding the choice of contrasting parents for crossing (Tsindi et al., 2023). Linkage disequilibrium (LD) decay analyses, often based on genome-wide SNPs, inform on historical recombination and selection, and help define the resolution of association mapping in each subpopulation. Recent studies also integrate haplotype-based analyses and genome-wide scans for selection (e.g., BayeScan, EigenGWAS) to identify genomic regions whose allele frequency differentiation aligns with population structure and local adaptation (Kim et al., 2025). Together, these methods provide a coherent framework for dissecting population structure, controlling for stratification in GWAS, and designing efficient germplasm utilization strategies. 4.2 Genetic differentiation among soybean populations from different geographic origins Comparative SNP-based studies consistently show that geographic origin and domestication status are primary drivers of soybean population structure. Large-scale analyses of global germplasm, including the USDA collection and broad Korean–Chinese–Japanese panels, clearly separate wild (Glycine soja) from cultivated (Glycine max) accessions, with wild populations further partitioned into multiple lineages that track their East Asian collection zones (Kaga et al., 2012; Li et al., 2024). Within cultivated soybean, Asian, North American, South American, and European gene pools typically form distinct but partially overlapping clusters, reflecting historical patterns of germplasm exchange, founder effects, and regional selection (Potapova et al., 2023). For example, Japanese and Korean accessions are relatively homogeneous and distinct from Chinese accessions, while American cultivars derive their ancestry largely from a subset of Chinese subpopulations (Jeong et al., 2018). European cultivars cluster into two main groups with substructure corresponding to country of origin and maturity group; American introductions show the lowest differentiation from European material, whereas Swiss lines and some Eastern European cultivars are more distinct. Regional studies further highlight variable levels of differentiation and diversity among emerging production areas. In sub-Saharan Africa, elite TGx lines and cultivars adapted to African environments form several SNP-defined clusters, but overall show a broad genetic base compared with some temperate breeding pools (Tsindi et al., 2023). Southern African collections combining temperate and tropical material exhibit very low F_ST (~0.06) between subgroups, indicating weak genetic differentiation and extensive germplasm sharing across programs (Tsindi et al., 2023). Brazilian germplasm, in contrast, often displays a narrow genetic base and strong signatures of selection, with structure shaped by region, company, and relative maturity group; Asian accessions are consistently the most

RkJQdWJsaXNoZXIy MjQ4ODYzNA==