MGG_2025v16n5

Maize Genomics and Genetics 2025, Vol.16, No.5, 239-250 http://cropscipublisher.com/index.php/mgg 245 of breeding, they have also screened out multiple potential drought-tolerant haplotypes and candidate genes through the system. This is of direct significance for the subsequent selection of more stable yield strains. But what is truly impressive is the geospatial analysis they conducted. In areas with a high incidence of drought, after planting these drought-resistant materials, the yield can be 5% to 40% higher than that of commercial varieties. This range sounds like a large fluctuation, but in reality, even a 10% gap is very realistic for farmers' income (Tesfaye et al., 2016). Of course, the issues of scalability and infrastructure have not been completely resolved yet-such problems are not uncommon in Africa. But taking a step back, this case at least shows that combining genomic data, remote sensing information and machine learning technology is a promising path, especially for regions like SSA that are greatly affected by climate change, which may indeed bring about some changes. 7 Challenges and Limitations 7.1 Data-related constraints Not to mention how the model is built, many problems actually get stuck at the "data" step. Methods like genomic selection and machine learning rely particularly on a large amount of high-quality data when it comes to predicting corn yields under drought conditions. But what about the reality? Although high-throughput genotyping and phenotypic analysis generate a large amount of data, they are often incomplete, noisy, or have an unbalanced distribution of variables. Especially the data related to drought, the degree of standardization is generally not high (Tong and Nikoloski, 2020). Often, data from different environments are difficult to concatenate, with a lack of vertical information and incomplete annotations. This makes model training very challenging, not to mention generalization ability. Moreover, when there is a large amount of omics data, it is prone to dimensional explosion. If feature screening or dimensionality reduction is not done well, overfitting is very likely to occur. The model may seem very accurate on the surface, but in reality, it is unstable. 7.2 Model interpretability and biological relevance Some models do make good predictions, but if you really ask them "Why do you predict like this?", they can't explain it clearly either. This kind of "black box" problem is particularly common in complex models such as deep learning and ensemble methods (Mal øy et al., 2021). Moreover, no matter how accurate the model is, breeders are more concerned about: which genes and which environmental variables are truly effective? If this issue is not clarified, actual breeding decisions will be hesitant. Although some methods, such as attention mechanisms and feature importance analysis, are attempting to make models "speak human language", they are far from enough to turn these results into truly actionable and verifiable biological knowledge (Shook et al., 2020). So a core question remains unsolved: Can the model capture the interactions between those real and biologically significant genotypes and the environment? If not, even if the prediction is extremely accurate, it will be very difficult to be truly put into use. 7.3 Practical implementation barriers No matter how beautiful a model is built, if it is not used in the end, it is just for show. The integration technology of GS and ML does have potential, but in practical operation, problems keep emerging one after another. First of all, it has relatively high requirements for resources. No matter how powerful a model is, there must be someone who can run it. To have sufficient computing power, there must be someone to manage the data and a team that can understand the model and adjust the parameters. These conditions are not available in many regions, especially in areas with weak breeding foundations, where they are simply "unusable". Another problem that is not easy to solve is-poor mobility. A model that works well in location A may not adapt to the local environment in location B. When the variety changes or the environment changes, the effect is compromised and re-training and re-validation are required (McBreen et al., 2025). This means that you want to rely on a universal model to cover all breeding scenarios? Ideals may be full and rich, but reality may not buy them. And there is another point that is often overlooked: people. Whether the policies can support it and how well the digital infrastructure is built are one aspect. More importantly, have breeders and data scientists been properly trained? If people can't keep up, this entire system will eventually be hard to be truly implemented. Ultimately, the integration approach of GS-ML is not without prospects; it's just that the road ahead has not yet been paved. No matter how advanced the technology is, if no one uses it or uses it poorly, it can only remain at the stage of being "beautiful on paper".

RkJQdWJsaXNoZXIy MjQ4ODYzNA==