MGG_2025v16n3

Maize Genomics and Genetics 2025, Vol.16, No.3, 139-148 http://cropscipublisher.com/index.php/mgg 142 markers (such as SNPs), phenotypic measurement data, and processed environmental variables (such as climate, soil, etc.) are added to the model, the prediction will be more accurate. For example, not only using genetic data, but also adding markers related to traits and environmental information related to developmental stages, the prediction accuracy can be increased by 14% to 28% (He et al., 2025). It is critical to feature process the original environmental data before using it in the model, which can make the machine learning model understand the meaning of the data better (Fernandes et al., 2024). 4.2 Architecture of GS+ML hybrid predictive models The GS+ML model combines traditional genomic selection methods (such as GBLUP, Bayes B) and modern machine learning methods (such as random forest, neural network, XGBoost). This combination can better handle the complex relationship between genes and environment. In terms of model structure, genes and environment can be combined in an "additive" way (G+E) or a "multiplicative" way (GEI). Additive models are fast to calculate and easy to use; machine learning methods such as tree models can automatically discover the relationship between genes and environment without us setting it in advance (Fernandes et al., 2024). Now we can also use automated machine learning platforms to integrate these models, which is more labor-saving and can quickly test multiple schemes (Saleh et al., 2023). 4.3 Optimization of model pipelines for drought scenarios In order to more accurately predict corn yield under drought conditions, some optimization methods can be used. Multi-environment modeling is to train the model by putting data from different regions or different years together. In this way, data from other experiments can be used to fill in some missing parts, which helps to improve the prediction effect (Bhandari et al., 2018; Dias et al., 2018). Genetic markers and environmental variables that are closely related to drought resistance or environmental factors should be selected. This can reduce the interference of useless information in the model and allow the model to focus more on learning important parts (He et al., 2025). After the model is trained, it is necessary to do several rounds of verification and debug and optimize the parameters. This can prevent the model from "memorizing" the training data and not being able to use it under different conditions. Optimized models usually maintain relatively good results under different drought environments (Saleh et al., 2023; Fernandes et al., 2024). "Hybrid model" and "dimensionality reduction" techniques can also be used to reduce the pressure on the model during operation. Because when faced with a large amount of data, hybrid models can combine the advantages of multiple algorithms, and dimensionality reduction can simplify the number of variables and make the model run faster (Jighly et al., 2021). After optimization using these methods, the model is not only more accurate, but also can cope with various drought scenarios. This also helps us to more quickly select those corn varieties that are truly drought-resistant. 5 Model Evaluation and Cross-Environment Transferability 5.1 Cross-validation and external dataset testing Cross-validation is a common method to check whether a model is useful. K-fold cross-validation and leave-one-out-of-the-box (LOOCV) are two of them. Their approach is to repeatedly split the data into training sets and test sets, and then train and validate them in turn. This can help us see the predictive ability of the model and reduce the problem of the model "memorizing" the training data (Yates et al., 2022; Qiu, 2024). However, it is not enough to rely on these "internal data" for verification. Sometimes, the model may just remember the original data and it will not work in a different environment. Therefore, many researchers now pay more attention to "external validation", that is, testing the model with data from other places or under different conditions. This can show whether the model is easy to generalize, and can also find some problems that cannot be seen in internal testing, such as whether the model is overfitting or whether it is only applicable to a specific data distribution (Ho et al., 2020; Cabitza et al., 2021; Eertink et al., 2022; Riley et al., 2024). 5.2 Evaluation across multiple drought scenarios The most direct way to know if a model is accurate in drought conditions is to test it in different drought environments. Droughts can be long or short, severe in some places or mild in others, and the climate conditions in some places are different. All of these will affect the performance of the model. We can test it in several ways.

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==