Legume Genomics and Genetics 2026, Vol.17, No.1, 49-67 http://cropscipublisher.com/index.php/lgg 54 Figure 2 Methods for analyzing soybean genomic diversity using SNP markers 3.3 Data analysis and bioinformatics methods SNP datasets generated from arrays, GBTS, or GBS platforms were processed through standardized bioinformatics pipelines prior to population-genetic analyses. Initial steps included SNP calling against the reference genome, filtering for missingness and MAF thresholds, and removal of redundant or poorly mapped markers to yield a high-quality matrix of individuals by loci (Niu et al., 2024). Genetic diversity parameters and marker statistics (PIC, gene diversity, heterozygosity, allele frequencies) were calculated using specialized software such as PowerMarker, POPGENE, or comparable population-genetic packages (Abebe et al., 2021). To explore population structure, Bayesian clustering using STRUCTURE and likelihood-based determination of the optimal number of subpopulations (K) via the Evanno ΔK method were commonly applied under admixture models, often with thousands of burn-in and MCMC iterations to ensure convergence. Complementary multivariate methods, including principal component analysis (PCA) or principal coordinate analysis (PCoA), and discriminant analysis of principal components (DAPC), were used to visualize genetic relationships and validate STRUCTURE-defined clusters (Chander et al., 2021). Hierarchical clustering (e.g. UPGMA or neighbor-joining based on genetic distance matrices) and phylogenetic tree construction provided additional perspectives on the grouping of accessions by geography, improvement status, or pedigree (Rani et al., 2023). AMOVA was implemented to quantify variance components and F-statistics among and within groups, while software such as STRUCTURE HARVESTER and R packages (e.g. adegenet) facilitated model selection and graphical output (Shaibu et al., 2021). For very large resequencing-based SNP resources, further analyses included genome-wide LD estimation, identification of conserved and highly variable genomic intervals, and functional annotation of large-effect SNPs and InDels using gene ontology and pathway databases to link diversity patterns with candidate genes underlying key agronomic traits (Valliyodan et al., 2021).
RkJQdWJsaXNoZXIy MjQ4ODYzNA==