! ! Legume Genomics and Genetics 2025, Vol.16, No.1, 1-10 http://cropscipublisher.com/index.php/lgg! ! 2! Nowadays, GWAS has become a commonly used method in breeding research. It is particularly suitable for studying complex traits determined by many genes. In soybean research, scientists have identified many key gene loci through GWAS. These loci are related to some important traits, such as flowering time, maturity period, plant height, and yield performance in different environments (Zhang et al., 2015; Li et al., 2023). The locations of these SNPS are all quite clear. Breeders can select plants with ideal traits more quickly based on these markers, thereby accelerating the breeding process (Bhat et al., 2022). In addition, GWAS can also help us better understand how these traits are inherited. These research results also provide a considerable amount of useful data support for future soybean breeding (Kim et al., 2023b). The purpose of this study is to summarize some current research achievements on soybean GWAS and look forward to the future development direction. We focused on introducing some gene loci that have been discovered so far and are related to important traits. We also discussed how GWAS and traditional breeding methods can be combined, and finally explained the practical role of these achievements in variety improvement. Through this summary, we hope to make more people recognize the importance of genomic research in crop breeding and also provide some reference directions for future research and breeding work. 2 Principles of GWAS 2.1 GWAS methodology Genome-wide association studies (GWAS) are a very common genetic analysis method nowadays, mainly used to identify whether there is a relationship between gene variations and traits. Its approach is to search for SNP (single nucleotide polymorphism) variations throughout the entire genome. Then compare individuals with a certain trait and those without it to see if they have any particularly common SNPS. If a certain SNP appears more frequently in individuals with a trait, then this SNP may be related to that trait. To make the results more reliable, researchers later introduced a hybrid model. This method can reduce false positives, that is, false positives, making the analysis more reliable (Cortes et al., 2021). Nowadays, GWAS can not only be used to study common traits such as plant height and yield, but also begins to be applied to analyze more detailed molecular traits like metabolites and enzyme activity. All of these can help us identify key genes more quickly, which is very helpful for breeding and can accelerate the process of cultivating superior varieties. 2.2 Genotyping and phenotyping for GWAS For GWAS analysis to be conducted well, both genotyping and phenotypic data must be accurate. Genotyping is to identify the variations at different locations in the genome. There are currently two commonly used methods: one is SNP chip, and the other is whole genome resequencing (Korte and Farlow, 2013). The way of obtaining phenotypic data is also improving. In the past, it mainly relied on manual observation and scoring. Now, many people have begun to use image recognition and deep learning (DL) to automatically extract trait information. This approach saves both effort and time, and the results are more stable (Rairdin et al., 2022). For instance, some studies have used deep learning to assess the severity of soybean diseases and identified SNP loci that might be related to disease resistance. 2.3 Statistical approaches in GWAS In GWAS, the choice of statistical methods is crucial as it determines whether truly useful gene loci can be identified. The most widely used model nowadays is the hybrid linear model (MLM). It can simultaneously take into account the structure among populations and the kinship among individuals, effectively reducing false positives. However, in some crops, such as soybeans with a relatively single genetic background, traditional methods are sometimes not strong enough (Yoosefzadeh-Najafabadi et al., 2021). To address this issue, some studies have introduced machine learning methods, such as support vector regression (SVR) and Random Forest (RF). These methods perform better and have higher accuracy in identifying QTLS (quantitative trait loci) (Yoosefzadeh-Najafabadi et al., 2023). In addition, some new methods are also constantly evolving. For example,
RkJQdWJsaXNoZXIy MjQ4ODYzNA==