Bioscience Evidence 2025, Vol.15, No.5, 249-259 http://bioscipublisher.com/index.php/be 253 5.3 Outcomes After the combination of multi-omics and machine learning, the accuracy rate of corn yield prediction increased from 0.32 to 0.43, which was significantly better than that of single data (Wu et al., 2024). Genomic selection has achieved early screening of new strains in tropical maize, with the highest prediction accuracy reaching 0.71, saving a large amount of experiments and resources (Beyene et al., 2021). Federated learning methods enable different institutions to collaborate and improve model performance and prediction accuracy without sharing raw data (Zhang et al., 2023). Deep learning combined with high-throughput phenotypes has also improved the early prediction of complex traits such as yield and flowering period, accelerating the progress of breeding (Sarzaeim et al., 2022; Shu et al., 2022). 5.4 Lessons learned Integrating multi-omics and multi-source data can improve the prediction of complex traits, but it also poses higher requirements for data quality, feature screening and model selection (Beyene et al., 2021; Sarzaeim et al., 2022; Wu et al., 2024). Cross-institutional and cross-regional data collaboration, such as federated learning, can enhance the generalization ability of models while protecting privacy, and is suitable for large-scale breeding networks (Fritsche-Neto et al., 2021; Zhang et al., 2023). New technologies such as high-throughput phenotyping and deep learning need to be combined with traditional breeding experience in order to truly achieve the transition from "data-driven" to "decision support" (Beyene et al., 2021; Sarzaeim et al., 2022; Shu et al., 2022). Continuous improvement of data collection, cleaning and feature engineering processes is the basis for the success of big data breeding projects. 6 Challenges and Limitations 6.1 Technical barriers There are many technical challenges in big data analysis in corn breeding. First of all, the volume of data is extremely large and its sources are complex, including genomic, phenotypic, environmental and management information, etc. This puts a lot of pressure on data storage, transmission and processing (Kamilaris et al., 2017; Nepolean et al., 2018; Cravero et al., 2022; Xu et al., 2022). Data cleaning, integration and standardization are also very difficult, especially for unstructured data such as images, text and sensor data, which are not easy to automatically process and extract effective information (Onsongo et al., 2022). In addition, machine learning and deep learning models are highly dependent on high-quality, well-labeled data. However, agricultural data often have missing parts, noise and inconsistent labels, which can reduce the prediction accuracy and generalization ability of the model (Govaichelvan et al., 2023; Crossa et al., 2024; Kudiyarasudevi and Suresh, 2024; Wu et al., 2024) (Figure 2). Meanwhile, the shortage of high-performance computing resources and professional data talents also limits the application of big data in breeding (Lassoued et al., 2021). 6.2 Biological complexity The traits of corn are influenced by many factors, including genotype, environment, and the interaction between genes and the environment (G×E). This makes the breeding problem highly biologically complex (Nepolean et al., 2018; Xu et al., 2022; Crossa et al., 2024). Integrating multi-dimensional omics data can improve the prediction effect, but the model will become more complex and more difficult to interpret (Wu et al., 2024). Complex traits, such as yield and stress resistance, are often jointly determined by polygenic, epigenetic and metabolic networks, and vary greatly in different environments, which also limits the migration and generalization ability of the model. In addition, the genetic basis of some traits is still unclear, and there is a lack of reliable functional annotation and biological verification, which poses obstacles to precision breeding (Nepolean et al., 2018; Xu et al., 2022).
RkJQdWJsaXNoZXIy MjQ4ODYzNA==