CMB_2025v15n3

Computational Molecular Biology 2025, Vol.15, No.3, 112-121 http://bioscipublisher.com/index.php/cmb 112 Feature Review Open Access Deep Learning for Predicting Gene Expression from Genomic Sequences ShiyingYu Biotechnology Research Center, Cuixi Academy of Biotechnology, Zhuji, 311800, China Corresponding author: shiying.yu@cuixi.org Computational Molecular Biology, 2025, Vol.15, No.3 doi: 10.5376/cmb.2025.15.0011 Received: 03 Mar., 2025 Accepted: 14 Apr., 2025 Published: 02 May, 2025 Copyright © 2025 Yu, This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.6 Preferred citation for this article: Yu S.Y., 2025, Deep learning for predicting gene expression from genomic sequences, Computational Molecular Biology, 15(3): 112-121 (doi: 10.5376/cmb.2025.15.0011) Abstract Different cell types of higher organisms share the same genomic sequence but have distinct gene expressions, which is attributed to complex gene regulatory mechanisms. Cracking the regulatory rules of gene expression is of vital importance for understanding diseases and life processes. This review examines the research progress on predicting gene expression from genomic sequences using deep learning, including data sources and processing, model architecture design, prediction methods, performance evaluation and interpretability analysis, current challenges and the latest advancements, and illustrates them through case studies of specific species. Finally, the prospects of the integration of deep learning and multi-omics in the future and its potential impact in precision medicine and functional genomics were prospected. Keywords Deep learning; Genomic sequence; Gene expression; Gene regulation; Multiomics integration 1 Introduction The genome may seem uniform, but that doesn't mean all cells work mechanically. For instance, different cells in the same person may have exactly the same genome, but their gene expression patterns can be vastly different. This is not merely a simple cause-and-effect relationship, but rather a covert manipulation by various complex regulations. What's more interesting is that only about 2% of the human genome is responsible for directly encoding proteins, while the remaining large non-coding sequences - accounting for 98% - are often overlooked but contain crucial information that determines when and under what conditions genes are expressed (Zhang et al., 2019). To truly understand the occurrence of diseases or the subtle changes in the life process, it is necessary to clarify the role these "silent" fragments play in it. Some people have suggested that directly predicting the expression pattern of genes from sequences might be a key step in cracking this "regulatory code", and it could also bring new breakthroughs to medical and biological research (Beer and Tavazoie, 2004). In gene regulation, distance is not absolute. The three-dimensional folding of chromatin enables enhancers that were originally separated by tens of thousands of bases to "come close" to promoters and remotely participate in regulation (Robson et al., 2019). Don't think that only the area close to the starting point is important - promoters are usually right next to the transcription starting point, but the positions of enhancers can be as far as the corner of the world. The cis-regulatory elements in the genome, such as promoters and enhancers, essentially provide binding sites for transcription factors to control transcriptional activity. Once the DNA sequence changes and the functions of these components are disrupted, the gene expression level may be rewritten, showing different traits and even causing diseases. Therefore, clarifying the correspondence between sequences and expressions has always been an unavoidable challenge in the study of gene regulation (Li et al., 2018). The data of genomics is getting larger and larger, and high-throughput technologies are outputting information wave after wave. Faced with such a scale, traditional analytical methods often find themselves struggling, while deep learning is gradually making its way into researchers' toolboxes. It can automatically extract features from complex data, especially excelling at those nonlinear laws that traditional methods fail to capture. In fact, experiments have already been conducted in gene expression prediction: the accuracy of deep neural networks is often higher than that of the old methods (Chen et al., 2016). Just think about it. By integrating deep learning with

RkJQdWJsaXNoZXIy MjQ4ODYzNA==