CGG2025v16n3

Cotton Genomics and Genetics 2025, Vol.16, No.3, 148-162 http://cropscipublisher.com/index.php/cgg 159 other hand, there are complex associations and redundancies between multi-omics. For example, some key gene mutations can cause chain changes in transcription and metabolism, resulting in highly correlated data; for example, epigenetic changes are sometimes independent of DNA sequences and have additional contributions to traits. Simply splicing different omics features into the model may cause information noise and overfitting. Therefore, it is necessary to design special methods to extract key signals from each group and establish hierarchical connections between them and phenotypes. At present, some studies have initially shown the value of multi-omics fusion in cotton breeding. The work of Zhao et al. (2024) proved that combining genomic and epigenomic data can reveal many functional variations that cannot be identified by genomic data alone, providing a new perspective for trait prediction. By constructing a multi-omics regulatory network, they locked 43 core genes for cotton fiber development, which would not be found if they were based on traditional GWAS alone. However, the challenges that need to be solved in multi-omics integration include: data standardization and alignment, different omics measurement scales are different, and normalization processing is required; feature selection, multi-omics data dimensions are extremely high, and how to screen out features that are truly associated with traits is difficult; model complexity, multi-omics fusion models often contain a large number of parameters, and the risk is more prone to overfitting, so more stringent regularization and cross-validation strategies need to be introduced (Guo et al., 2022). Deep learning provides a powerful tool for multi-omics fusion, and its multimodal network can automatically learn associations between different data modes. However, such models usually lack biological interpretability, which is also an aspect that needs to be weighed. The cost of acquiring multi-omics data is high. For example, whole-genome methylation sequencing of hundreds of cotton materials generates tens of TB of data. Therefore, economic costs and computational overheads must be considered in practical applications. Nevertheless, with the innovation of sequencing and detection technologies, multi-omics data will become increasingly abundant and accessible. We have reason to believe that by developing smarter data fusion algorithms (such as neural networks, Transformers, etc.) and making full use of cloud computing and high-performance computing clusters, multi-omics-driven cotton intelligent breeding will become possible. It will enable breeders to understand the formation mechanism of excellent traits from all aspects of genes, transcription, and epigenetics, and make more targeted designs and selections on this basis. 6.3 Combining model interpretability with practical applications Artificial intelligence models, especially deep learning models, are often regarded as "black boxes", which may affect their promotion and application in the field of breeding. What breeders want to know more is: Why does the model give such a prediction? Which genes or markers play a key role in it? If the model is difficult to explain, its prediction results are often difficult to be directly adopted by breeding decisions. Therefore, improving the biological interpretability of the model is a problem that AI breeding must face. One approach is to combine traditional genetic knowledge to analyze the model output. For example, the largest number of markers in the GS model can be counted and compared with known QTLs or genes to verify whether the model captures reasonable genetic signals. If the markers emphasized by the model are connected to important functional genes, the credibility of the results is increased (Billings et al., 2022). Another approach is to use specialized interpretation algorithms, such as SHAP values and sensitivity analysis, to quantify the impact of each input marker on the prediction, thereby identifying the gene regions that the model "values". A recent study explained the random forest model for cotton yield prediction and found that some of the markers that the model assigned the highest weights were precisely the areas near the previously reported yield QTLs. This shows that AI models can reproduce the judgment of human experts to a certain extent, thereby enhancing the trust of breeders. In order to facilitate practical application, the model needs to be closely integrated with the breeding process. For example, develop a friendly user interface so that breeders can input data and obtain prediction results without knowing programming; embed models into breeding management systems to achieve real-time prediction and decision support; provide model uncertainty indicators to remind users to be cautious when the prediction

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==