MGG_2025v16n5

Maize Genomics and Genetics 2025, Vol.16, No.5, 239-250 http://cropscipublisher.com/index.php/mgg 242 problems. In recent years, however, people's focus has clearly shifted towards tree-based ensemble methods such as Random Forest (RF) and XGBoost, as well as classic nonlinear models like Support Vector Machine (SVM). Deep learning has not been idle either. Architectures such as CNN and LSTM have gradually emerged in breeding research, especially in those involving time series or image processing (Kang et al., 2020; Cheng et al., 2022). However, whether an algorithm is new and cool or not has nothing to do with whether it is "easy to use" or not. Some studies have instead found that XGBoost and RF are more reliable in terms of prediction accuracy and stability than certain deep learning models (Tahi et al., 2024). So, when it comes to different data and problem scenarios, no algorithm can be a one-size-fits-all solution. It's still necessary to "select the type as needed". 3.2 Advantages of ML in breeding contexts In fact, it's not that no one uses traditional methods; it's just that in some situations they really can't handle them. For example, what factors affect the output? Genotype, climate, management methods... These factors are intertwined, and the relationship is complex and non-linear, which conventional statistical methods simply cannot sort out. Machine learning, however, has no such concern. It doesn't care whether your underlying logic is clear or not. As long as there is sufficient data, it can learn something from it. There is another practical problem: the data is too diverse. Genomic information, remote sensing images, weather records, soil parameters... These data have different sources and structures, but ML can still swallow them all up to run the model. This ability is precisely one of its greatest advantages over traditional regression (Guo et al., 2023; Miao et al., 2024). For breeders and farmers, it never hurts to know more about the situation earlier. ML models can provide predictions in the early stage of crops and also use feature analysis to identify which factor has the greatest impact on yield (Wu et al., 2024). This not only helps in selecting materials, but also facilitates subsequent planting management and resource allocation. 3.3 Challenges in ML application to genomics Although it sounds like machine learning has a promising future in the application of genomics, there are also many problems in actual operation. First of all, you need to have a large amount of high-quality data and clear labels; otherwise, how can the model learn? Besides, genomic data itself has too high a dimension, and it is easy to overfit if one is not careful. The interpretability of deep models is another troublesome point. The trained accuracy rate may be very high, but if you ask it why it makes such a prediction, it is hard for it to explain clearly (Shahhosseini et al., 2019; Abbasi et al., 2025). Especially when you are dealing with a multi-environment and multi-group data scenario, a model that performs well in one place does not necessarily mean it can be smoothly transferred to another for use. Furthermore, the formats of genomic, phenotypic, and environmental data are inherently inconsistent. To integrate them into a single model is not only a matter of computing power but also technically challenging (Van Klompenburg et al., 2020). If these obstacles are not overcome, ML will also find it difficult to fully realize its true potential in corn breeding. 4 Integrating Genomic Selection and Machine Learning 4.1 Rationale for integration In fact, putting GS and ML together is not out of the pursuit of some kind of "theoretical integration", but rather a result of pragmatism. After all, in the field of breeding, there are too many complex variables. Relying solely on one set of methods often leads to neglecting one aspect for another. GS has a set of methods for processing genomic data, which can provide genomic breeding values for each material. ML is more flexible and adept at handling nonlinear and multi-dimensional data structures, such as environmental factors and phenotypic information. It can also capture these. However, if we talk about the greatest use of the combination of the two, it is still to improve the accuracy of prediction in a variable environment-especially for traits like drought resistance that are influenced by both genotype and environment (Varshney, 2021). In the current situation where climate change is becoming increasingly uncontrollable, how to stabilize the output of crops such as corn has become even more urgent.

RkJQdWJsaXNoZXIy MjQ4ODYzNA==