CMB_2025v15n3

Computational Molecular Biology 2025, Vol.15, No.3, 122-130 http://bioscipublisher.com/index.php/cmb 126 then mapped patient data onto these networks to identify active modules and key nodes of the networks. Specifically, we use the Similarity Network Fusion (SNF) algorithm to fuse the sample similarity networks of different omics. SNF first calculates the similarity network between samples based on gene expression, protein, and metabolic data respectively, and then iteratively fuses them to obtain the comprehensive similarity. This fusion network is used for patient classification and prediction, and through community testing, subgroups of patients with similar multi-omics characteristics are identified. A review indicates that network fusion has achieved success in scenarios such as drug discovery. We applied it to patient clustering and initially discovered some interesting subtype phenomena. Bayesian methods can naturally integrate multi-source uncertain information through probabilistic graphical models. They construct directed acyclic graphs of genes and phenotypes through Bayesian networks, with edges representing the probabilities of causal relationships. By learning from patient data, we have obtained some optimal network structures. For instance, for a set of genetic metabolic disease data, our model automatically learns the causal chain of a certain enzyme gene -> key metabolites -> corresponding phenotypes, verifying the existing knowledge and suggesting new relationships (Ibrahim, 2023). 4.3 Function annotation, feature extraction and path enrichment analysis One of the key goals of multi-omics integrated analysis is to transform massive amounts of data into biological knowledge that can be explained by humans. This requires functional annotation and exploration of the biological significance of the discovered patterns. Therefore, in the analysis process, we incorporated steps such as functional annotation, feature extraction, and path/network enrichment to assist in result interpretation and new discoveries. When the integrated analysis yields a list of important genes or molecules, we immediately annotate and explain their biological functions. For example, if the network algorithm identifies a group of Gene nodes highly associated with diseases, we invoke the gene ontology (GO) database to annotate the known functions of these genes (such as the biological processes they are involved in, molecular functions, etc.). If most of these genes are concentrated in a certain process (such as muscle structure development), this suggests that the disease mechanism is related to this process (Bottini et al., 2022). For instance, the Bayesian model selects several candidate metabolites. We query databases such as HMDB to obtain their biochemical pathways and disease-associated annotations, in order to determine which ones are worth in-depth verification. When obtaining a list of significantly different genes, proteins or metabolites, we often use pathway enrichment to condense biological topics. The specific approach is to input the list of differences into the KEGG pathway, Reactome pathway or GO biological process database, and use hypergeometric tests or gene set enrichment analysis (GSEA) to calculate which pathways significantly enrich the differentially expressed molecules. For metabolites, we conduct metabolic pathway enrichment, such as using the algorithm in MetaboAnalyst. Abnormalities such as "purine metabolism" and "arginine and proline metabolism" were found. Through pathway enrichment, scattered molecular lists are elevated to the level of pathway networks, facilitating the understanding of their functional significance (Paczkowska et al., 2020). 5 Case Study 5.1 Overview of DMD and available multi-omics datasets Duchenne muscular dystrophy (DMD) is an X-linked recessive disorder caused by mutations in the DMD gene, leading to absence of dystrophin protein and progressive muscle degeneration. It is one of the most prevalent genetic muscle diseases in childhood. Currently, standard diagnostics rely on genetic tests and muscle biopsy. In our integrated database, we identified several DMD-relevant datasets: genomic variants from patient exomes (including copy-number deletions in DMD), and transcriptomic profiles from muscle biopsies of patients and controls. For example, we imported RNA-seq data from recent single-nucleus studies of DMD muscle tissue. Additional resources include proteomic profiles and limited metabolomic data from DMD cohorts (Dowling et al., 2024). By aggregating these, we have a comprehensive multi-omics view of DMD.

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==