Computational Molecular Biology 2024, Vol.14, No.3, 106-114 http://bioscipublisher.com/index.php/cmb 109 4.2 Identifying regulatory elements Identifying regulatory elements within the genome is another area where machine learning has made substantial contributions. Convolutional neural networks (CNNs) have been developed to predict cell type-specific epigenetic and transcriptional profiles from DNA sequences alone. These models can identify promoters and distal regulatory elements, synthesizing their content to make effective gene expression predictions (Kelley et al., 2017). Machine learning algorithms trained on multiple genomes have improved the accuracy of gene expression predictions and facilitated the analysis of human genetic variants associated with molecular phenotypes and diseases (Kelley, 2019). These advancements highlight the potential of machine learning to enhance our understanding of gene regulation and its implications for human health. 4.3 Understanding epigenetic modifications Epigenetic modifications play a crucial role in gene expression and are linked to various cellular processes, including differentiation, development, and tumorigenesis. Machine learning methods have been widely applied to study these modifications, providing insights into the regulatory mechanisms that rely on epigenetic changes. For example, a unified deep learning model called ZayyuNet has been proposed for identifying various epigenetic modifications, such as DNA N6-Methyladenine (6mA) and RNA N6-Methyladenosine (m6A). This model has demonstrated superior performance compared to current state-of-the-art models (Abbas et al., 2021). Deep neural networks have been utilized to interpret genomic and epigenomic data, focusing on tasks such as sequence motif identification and gene expression prediction (Talukder et al., 2020). These approaches have significantly advanced our understanding of how epigenetic modifications influence gene regulation and cellular function. 5 Integrating Multi-Omics Data with AI 5.1 Challenges in multi-omics integration Integrating multi-omics data presents several challenges due to the inherent complexity and heterogeneity of the data. High-dimensionality, data heterogeneity, and noise are significant obstacles that need to be addressed to effectively combine data from different omics layers such as genomics, proteomics, and metabolomics (Mirza et al., 2019). The curse of dimensionality, where the number of features far exceeds the number of samples, complicates the analysis and integration process. Missing data and class imbalance further exacerbate these challenges, necessitating specialized computational approaches to manage these issues effectively. The lack of universal analysis protocols and the need for interpretability and explainability in models also pose significant hurdles (Wörheide et al., 2021). 5.2 AI models for multi-omics analysis Machine learning and deep learning models have shown great promise in addressing the challenges of multi-omics data integration. Various models, including Bayesian models, tree-based methods, kernel methods, network-based fusion methods, and matrix factorization models, have been employed to integrate and analyze multi-omics data (Li et al., 2016). Deep learning, in particular, has gained prominence due to its ability to capture complex, non-linear relationships in large-scale datasets (Kang et al., 2021; Saha et al., 2023). Autoencoders and other deep neural networks have been used to learn cross-modality interactions and provide interpretable results in a multi-source setting (Benkirane et al., 2023). These models have been applied to tasks such as disease subtype classification, biomarker discovery, and drug response prediction, demonstrating their potential in advancing precision medicine. 5.3 Applications in personalized medicine The integration of multi-omics data using AI models has significant implications for personalized medicine. By combining data from various omics sources, researchers can gain a comprehensive understanding of the molecular mechanisms underlying diseases, leading to more accurate disease prediction, patient stratification, and the development of personalized treatment plans (Figure 2) (Kang et al., 2021; Reel et al., 2021). For instance, deep learning models have been used to classify tumor types and breast cancer subtypes, as well as predict survival outcomes in cancer patients. The ability to integrate and analyze multi-omics data also facilitates the discovery of new biomarkers, which can be used to monitor disease progression and response to treatment, ultimately
RkJQdWJsaXNoZXIy MjQ4ODYzNA==