CMB_2025v15n3

Computational Molecular Biology 2025, Vol.15, No.3, 112-121 http://bioscipublisher.com/index.php/cmb 118 information is basically ignored, thus reducing the prediction accuracy. Our idea is to create a deep learning model that incorporates the genomic sequence and chromatin interaction information of corn to more accurately predict the gene expression levels of different tissues and identify key regulatory elements at the same time. Incidentally, I also want to see if multi-omics integration can indeed enhance predictive performance and evaluate the potential value of these regulatory elements in corn growth and breeding. 7.2 The design and implementation process of deep learning models In this case, we developed a corn gene expression prediction model called DeepCBA. It uses a dual-pathway structure: one pathway processes sequences near the gene promoter, and the other pathway receives distal fragments that have chromatin interactions with the gene. The two streams of data each extract features through a convolutional neural network and then converge at a higher level to jointly output the expression value of the target gene. This design enables the model to simultaneously capture the regulatory effects of the near-end cis element and the long-range enhancer (Wang et al., 2024). During training, we took the gene expression data of multiple corn tissues as the supervisory signal, adopted the mean square error as the loss, and combined cross-validation and regularization to prevent overfitting. After the model was trained, we evaluated it on the test set and performed feature visualization to see exactly which key sequence motifs the model focused on (Zeng et al., 2018). 7.3 Result analysis and practical application value DeepCBA has performed quite impressively in predicting gene expression in corn. Compared with the model that only uses promoter sequences, when the information of remote chromatin interactions is also added, the predicted correlation increases from approximately 0.47 to 0.93 at once, and the remote regulatory factors are clearly better captured. The model also identified many sequence motifs related to high expression. Most of these motifs are concentrated in the open chromatin regions of corn and highly overlap with the sites expressing quantitative traits, indicating that the features it has learned are in good agreement with the real regulatory elements (Jiang et al., 2020). We also performed site-directed mutagenesis on the promoters of two corn genes, and the results showed that their expression changes were almost consistent with the model predictions. Overall, DeepCBA not only makes predictions more accurate but also identifies key regulatory elements, providing a new tool for functional genomics research and molecular breeding. Researchers can use this tool to screen elements that affect yield or stress resistance and make targeted improvements through gene editing. 8 Future Outlook and Conclusions The combination of deep learning and multi-omics has opened up a very broad path for gene expression prediction. With the continuous emergence of new data such as single-cell epigenomics and spatial transcriptomics, models can simultaneously absorb multi-level information including genomics, transcriptomics, and epigenomics, and the regulatory networks they construct are also more complete. Future research is likely to bind different omics data and deep models together in order to more accurately restore the full picture of gene regulation. Take disease research as an example. By feeding the model with gene sequence variations, chromatin states, and transcriptome data together, the impact of pathogenic variations on expression can be better predicted, providing a basis for precision medicine. Crop science also has similar demands. Multi-omics models can help explain the changes in gene expression under environmental stimuli and guide the breeding of better varieties. Of course, data integration standardization, model complexity and computational costs are all problems that must be addressed, but technological progress is likely to make this approach the norm in functional genomics research. The potential of predicting gene expression based on genomic sequences has long been eyed by the medical and biological communities. Especially in precision medicine, such methods can help annotate the functions of a large number of non-coding variations in humans. Many disease-related mutations do not fall in the coding region but may function by altering gene expression. Deep learning models can directly predict the impact of these variations on expression, providing clues for the search of pathogenic mechanisms and drug targets. The research approach of functional genomics is thus changing - in the past, large-scale experiments were relied on to screen regulatory elements. Now, models can first predict potential regulatory sequences across the entire genome and then select

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==