Bioscience Evidence 2025, Vol.15, No.5, 249-259 http://bioscipublisher.com/index.php/be 250 Genomic data: As sequencing technology becomes increasingly advanced, the whole genome and pan-genome data of corn are constantly increasing. These data contain information such as sequences, genetic models, structural variations and transpose elements of different varieties, subspecies and wild relatives (Woodhouse et al., 2021; Sen et al., 2023). Multi-omics data: including transcriptome, epigenome, proteome and metabolome, etc. This information can reveal the regulation of gene expression and metabolic pathways, and also explain the molecular basis of phenotypic differences (Sen et al., 2023; Wu et al., 2024). High-throughput phenotypic data: By using technologies such as unmanned aerial vehicles, orbital platforms, remote sensing and image analysis, phenotypic data can be obtained at different growth stages, such as plant height, flowering period and leaf area (Meng et al., 2021; Guo et al., 2022; Adak et al., 2023; Li et al., 2023; Wu et al., 2024). Environmental and management data: including climate, soil, fertilization and irrigation, etc., this information is important for understanding the interaction between genes and the environment and optimizing breeding methods (Meng et al., 2021; Guo et al., 2022; Li et al., 2023). Database and resource platforms: such as MaizeGDB and Maize Feature Store, integrate and manage various types of data to facilitate the use and analysis by researchers (Woodhouse et al., 2021; Sen et al., 2023). 2.2 Integration of heterogeneous data There are many types of data for corn breeding, including structured, semi-structured and unstructured data. How to effectively combine these data is the key to improving efficiency. Multi-omics data fusion: Integrating genomic, phenotypic and metabolomic data, and then using machine learning and artificial intelligence methods to improve the accuracy of trait prediction. For example, the combination of SNP, image traits and metabolites can significantly improve yield prediction (Adak et al., 2023; Sen et al., 2023; Wu et al., 2024). Database and platform integration: Databases like MaizeGDB, which adopt the pan-genome framework and link genomic, expression, methylation and variation data from different sources, can support cross-species and cross-environment comparisons (Woodhouse et al., 2021; Sen et al., 2023). Semantic and ontological integration: Through semantic framework and ontological methods, the problem of inconsistent data meaning can be solved, enabling intercommunication and unified retrieval of different data sources. Distributed and cloud computing platforms: By leveraging Hadoop, Spark and cloud platforms, large-scale data can be efficiently stored and processed, and real-time integration can also be achieved. Intelligent Algorithms and Data Cleaning: Facing multi-source complex data, methods such as intelligent clustering and anomaly detection can be used to improve data quality and integration efficiency. Although big data integration has promoted the intelligence of corn breeding, there are still many problems, such as inconsistent standards, different data meanings, unstable quality and privacy security. In the future, it is necessary to strengthen cross-domain cooperation, improve data standards and sharing mechanisms, and enhance the integration capabilities of real-time, cross-domain and unstructured data. 3 Analytical Tools and Approaches 3.1 Machine learning and AI in breeding Machine learning (ML) and artificial intelligence (AI) have become two important tools in current corn breeding. These two methods mainly automatically extract information from a large amount of genomic, phenotypic and environmental data to achieve the purpose of improving the accuracy and efficiency of trait prediction (Esposito et al., 2019; Xu et al., 2022; Yan and Wang, 2022; Crossa et al., 2024; Wu et al., 2024; He et al., 2025) (Figure 1).
RkJQdWJsaXNoZXIy MjQ4ODYzNA==