Computational Molecular Biology 2024, Vol.14, No.2, 84-94 http://bioscipublisher.com/index.php/cmb 86 interventions (Fan et al., 2021). For example, in gene regulatory networks (GRNs), causal inference aims to identify how specific genes regulate others to control cellular functions. Unlike mere correlations, which capture associations, causal inference establishes directional and mechanistic links, allowing scientists to predict the outcomes of interventions such as gene knockouts or drug treatments (Ribeiro et al., 2016). Several computational approaches, including Bayesian networks and Granger causality, have been developed to model and infer causality in high-dimensional biological systems (Ahmed et al., 2020). 3.2 Difference between causality and correlation 3.2.1 Conceptual differences Correlation refers to a statistical relationship between two variables, indicating that they change together, but not necessarily that one causes the other. In contrast, causality establishes that one variable directly influences the other. In biological systems, two genes may be correlated due to shared regulatory mechanisms but might not have a direct cause-effect relationship. Causal inference methods, such as Mendelian randomization and directed acyclic graphs (DAGs), seek to identify these directional links (Furqan & Siyal, 2016). 3.2.2 Implications for biological research Understanding causality is fundamental for biological research as it allows researchers to determine the effect of gene mutations, protein interactions, or external factors like drug treatments. Correlation-based methods are limited because they do not provide insights into how changes in one gene affect another. Causal inference techniques enable scientists to make predictions about biological processes, guiding interventions such as gene editing or drug discovery. For instance, inferring causality between genes in cancer pathways can lead to targeted therapies aimed at inhibiting tumor growth (Hill et al., 2016) (Figure 1). Figure 1 Aggregate submission networks for the experimental data network inference task (SC1A) (Adopted from Hill et al., 2016) Image caption: (a) The aggregate submission network for cell line MCF7 under HGF stimulation. Line thickness corresponds to edge weight (number of edges shown set to equal number of nodes). To determine which edges were present and not present in the aggregate prior network, we placed a threshold of 0.1 on edge weights. Green and blue nodes represent descendants of mTOR in the network shown (Figure 2b,c and supplementary Figure 2). The network was generated using Cytoscape40. (b) Principal component analysis applied to edge scores for the 32 context-specific aggregate submission networks (Online Methods). The key regulatory factors and their interactions with downstream genes or proteins identified through network modeling. This approach helps researchers to identify important cancer-related biomarkers, thereby supporting precision medicine (Adopted from Hill et al., 2016). 3.2.3 Common misinterpretations A common misconception is that a high correlation between two variables implies a causal relationship. This assumption often leads to incorrect conclusions in biological research. For example, two genes might be co-expressed, suggesting a correlation, but this could be due to a third gene influencing both. Mistaking
RkJQdWJsaXNoZXIy MjQ4ODYzNA==