LGG_2026v17n1

Legume Genomics and Genetics 2026, Vol.17, No.1, 49-67 http://cropscipublisher.com/index.php/lgg 56 differentiated and genetically diverse reference group. Newer soybean regions such as Kazakhstan and West Siberia show accessions most similar to European and North American cultivars, with low within-group diversity in Kazakhstan pointing to a particularly narrow local base (Potapova et al., 2023). Studies of wild soybean at continental scale also reveal deep north–south differentiation and distinct lineages in Korea, Japan, and different parts of China, underscoring the importance of geographic structure in the ancestral gene pool (Meng et al., 2023). 4.3 The relationship between population structure and genetic diversity Population structure and genetic diversity are tightly coupled, with structure both reflecting historical changes in diversity and influencing how existing variation can be used in breeding. AMOVA and F_ST estimates across multiple SNP-based studies show that the majority of variation in soybean is usually found within, rather than among, populations, even when clear geographic or breeding-program clusters are present (Shaibu et al., 2021; Rani et al., 2023). For example, analyses of European cultivars, African germplasm, and IITA accessions all report 90%–98% of variance within populations and only a small fraction attributed to differences among countries, maturity groups, or STRUCTURE-defined clusters (Lukanda et al., 2023). Similarly, Brazilian and African collections often exhibit low to moderate F_ST between subpopulations despite discernible clustering by origin, maturity, or company, indicating substantial shared allelic backgrounds and extensive germplasm exchange . This pattern implies that carefully chosen parents from within a region can still capture meaningful diversity, but that crossing between more differentiated geographic or wild–cultivated groups is necessary to introduce novel alleles. At the same time, strong population structure can signal both reservoirs of unique variation and zones of genetic erosion. Wild soybean lineages generally harbor higher nucleotide diversity and stronger geographic differentiation than cultivated pools, confirming their value as sources of private alleles for stress tolerance and adaptive traits (Li et al., 2024). In contrast, historical breeding and domestication have produced monophyletic or weakly structured cultivated groups with conserved haplotypes in genomic regions under selection for yield, maturity, or seed composition (Valliyodan et al., 2021). Studies of Brazilian cultivars, European germplasm, and large resequenced panels identify large fixed or low-diversity segments associated with key agronomic QTL, alongside more diverse genomic regions that still retain useful variability (Andrijanić et al., 2023; Kim et al., 2025). In emerging regions such as Kazakhstan, the combination of clear clustering with temperate germplasm and very low within-group diversity indicates a narrow, vulnerable genetic base, motivating targeted introgression from diverse foreign and wild accessions. Thus, integrating population-structure analysis with diversity metrics helps breeders balance immediate adaptation needs—by exploiting existing structured variation—with long-term goals of broadening the genetic base through informed use of differentiated, high-diversity gene pools. 5 Associations Between Soybean Genomic Diversity and Important Agronomic Traits 5.1 The relationship between genetic diversity and yield traits Genomic diversity underpins phenotypic variation in key yield components such as seed yield per plant, number of pods and seeds, plant height, and 100-seed weight. Classical quantitative genetic studies across diverse soybean panels consistently report significant genotypic variance, high heritability, and substantial genetic advance for grain yield and its components, indicating abundant additive genetic variation that can be exploited through selection (Mitiku et al., 2025). Correlation and path analyses show that traits including number of seeds per plant, number of pods per plant, plant height, 100-seed weight, biological yield, and harvest index are positively and often strongly associated with seed yield, and frequently exert high positive direct effects, identifying them as efficient indirect selection targets. Morphological assessments combined with molecular markers (e.g., SSRs) further reveal that genotypes grouped as genetically distant often carry complementary yield-enhancing alleles, and crosses between such divergent parents tend to maximize transgressive segregation for yield (Ferreira et al., 2025). SNP-based association studies refine these relationships by linking diversity at specific loci and haplotypes to yield and yield components. Nested association mapping and diversity panels genotyped with high-density SNP arrays or GBS have identified dozens of loci and haplotypes affecting yield, maturity, plant height, lodging, seed

RkJQdWJsaXNoZXIy MjQ4ODYzNA==