Maize Genomics and Genetics 2025, Vol.16, No.5, 239-250 http://cropscipublisher.com/index.php/mgg 239 Feature Review Open Access Integrating Genomic Selection and Machine Learning for Predicting Maize Yield Under Drought Weichang Wu Biotechnology Research Center, Cuixi Academy of Biotechnology, Zhuji, 311800, China Corresponding author: weichang.wu@cuixi.org Maize Genomics and Genetics, 2025, Vol.16, No.5 doi: 10.5376/mgg.2025.16.0021 Received: 05 Jul., 2025 Accepted: 22 Aug., 2025 Published: 07 Sep., 2025 Copyright © 2025 Wu, This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Preferred citation for this article: Wu W.C., 2025, Integrating genomic selection and machine learning for predicting maize yield under drought, Maize Genomics and Genetics, 16(5): 239-250 (doi: 10.5376/mgg.2025.16.0021) Abstract Drought stress severely constrains maize yields, posing a significant challenge to global food security. This study explores the integration of genomic selection (GS) and machine learning (ML) methods to improve the accuracy of maize yield prediction under drought conditions. First, we outline the principles of GS, highlighting its advantages over traditional breeding methods and its growing application in drought-tolerant breeding. Next, we explore the application of various ML algorithms (such as random forests, support vector machines, and deep learning) for crop yield prediction, along with their strengths and limitations in the context of genomics. We then propose strategies for integrating GS with ML, including hybrid modeling frameworks and context-specific optimization, and discuss recent trends and research advances. Particular emphasis is placed on drought-specific modeling approaches that incorporate stress-responsive traits and evaluate their predictive accuracy under water-deficit environments. A case study from sub-Saharan Africa illustrates the practical application of an integrated GS-ML prediction system and its implications for climate-resilient maize breeding. Despite this promising outlook, challenges remain, including data heterogeneity, model interpretability, and implementation barriers. This study summarizes the future prospects of advancing the integration of genomic selection and machine learning (GS-ML) through technological innovation and its potential to support global climate-smart maize breeding. Keywords Genomic selection; Machine learning; Drought tolerance; Maize yield prediction; Climate-smart breeding 1 Introduction No one would object to the fact that corn plays a significant role in the global food supply issue. But the problems are not small either, especially when there is a drought. During some critical growth stages, such as pumping or filling, an untimely drought can cause a direct drop in yield. Moreover, in recent years, the weather has become increasingly unreasonable (Zhang and Xu, 2024). Droughts not only occur more frequently but also cause much greater damage. The traditional drought-resistant breeding methods have long been struggling. It's not that the technology is lacking, but rather that the trait of drought resistance itself is troublesome, involving too many genes and having significant environmental disturbances (Zhang et al., 2022). The result is that the same set of breeding strategies may perform completely differently in various places. Low efficiency and slow pace will eventually make it impossible to keep up with the express train of climate change. What should I do? Take another route. To ensure that corn yields are not controlled by weather or mood, a faster and more accurate screening mechanism is needed to pick out in advance those genotypes that can truly "withstand the pressure". At this point, genomic selection (GS) becomes a powerful tool. It does not focus on a few key genes but takes the entire genome together, using molecular markers to predict the potential of each variety. For the complex trait of drought resistance, the applicability of GS is quite high. After all, it can handle all kinds of genetic effects, big and small, without distinguishing between primary and secondary (Yuan et al., 2019). Of course, it's not a divine skill either. When there is too much data and too many variables, especially when genotypes interact with the environment, the model is prone to "overload" (Wang et al., 2025). At this point, relying solely on GS is indeed a bit inadequate. So, it's time for machine learning (ML) to come into play. Methods like random forests, neural networks, and support vector machines are particularly adept at handling complex data and nonlinear relationships. Once they are combined with GS, the accuracy of prediction reaches a higher level (Saimon et al., 2023; Azrai et
RkJQdWJsaXNoZXIy MjQ4ODYzNA==