Legume Genomics and Genetics 2025, Vol.16, No.2, 91-99 http://cropscipublisher.com/index.php/lgg 95 5.3 Case studies on climate adaptability and stable high-protein trait prediction Although it may only be a few percentage points of improvement, a 7% increase in prediction accuracy can influence the selection direction of a breeding season. Some studies that combined environmental data with genomics and phenotypes did indeed reach this level (Fernandes et al., 2024). Compared with those models that completely ignore environmental changes, response norm or factor analysis methods perform more stably and can identify candidate genotypes for stable production and high protein under variable and even extreme conditions (Burgueno et al., 2011; Li et al., 2024). For breeders, the practicality of these methods does not lie in the three words "accurate prediction", but in their ability to select materials that take into account both adaptability and performance with greater confidence. 6 Case Studies: Application of ML-Based Prediction in Breeding Populations 6.1 Genomic prediction in the U.S. soybean core collection At first, no one expected machine learning to perform so steadily in such complex materials, but it did perform well in breeding big data scenarios like the US Soybean Core Germplasm Bank. Common models like RF (Random Forest), SVM (Support Vector Machine), and MLP (Multi-Layer Perceptron) have been frequently used in high-throughput phenotypic and genotyping data, capable of predicting agronomic traits such as yield. In particular, some studies have attempted to run RF using hyperspectral reflection data from different environments, and the accuracy rate can reach 84%. If another layer of model combination is added, it can even jump to 0.93 (Figure 2) (Yoosefzadeh-Najafabadi et al., 2021). The greatest significance of this method does not lie in how "smart" the algorithm is, but in enabling breeders to select potential materials from the vast germplasm resource bank earlier and more accurately, and thus the entire breeding process becomes faster. Figure 2 A schematic representation of the machine learning algorithms used in this study to classify the soybean yield using reflectance bands: (A) Multilayer perceptron, (B) Support vector machine, and (C) Random forest (Adopted from Yoosefzadeh-Najafabadi et al., 2021)
RkJQdWJsaXNoZXIy MjQ4ODYzNA==