Computational Molecular Biology 2024, Vol.14, No.2 http://bioscipublisher.com/index.php/cmb © 2024 BioSci Publisher, registered at the publishing platform that is operated by Sophia Publishing Group, founded in British Columbia of Canada. All Rights Reserved. Publisher
Computational Molecular Biology 2024, Vol.14, No.2 http://bioscipublisher.com/index.php/cmb © 2024 BioSci Publisher, registered at the publishing platform that is operated by Sophia Publishing Group, founded in British Columbia of Canada. All Rights Reserved. BioSci Publisher is an international Open Access publishing platform that publishes scientific journals in the field of bioscience registered at the publishing platform that is operated by Sophia Publishing Group (SPG), founded in British Columbia of Canada. BioSci Publisher Publisher BioSci Publisher Editedby Editorial Team of Computational Molecular Biology Email: edit@cmb.bioscipublisher.com Website: http://bioscipublisher.com/index.php/cmb Address: 11388 Stevenston Hwy, PO Box 96016, Richmond, V7A 5J5, British Columbia Canada Computational Molecular Biology (ISSN 1927-5587) is an open access, peer reviewed journal published online by BioSciPublisher. The Journal is publishing all the latest and outstanding research articles, letters, methods, and reviews in all areas of computational molecular biology, covering new discoveries in molecular biology, from genes to genomes, using statistical, mathematical, and computational methods as well as new development of computational methods and databases in molecular and genome biology. The papers published in the journal are expected to be of interests to computational scientists, biologists and teachers/students/researchers engaged in biology. All the articles published in Computational Molecular Biology are Open Access, and are distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. BioSciPublisher uses CrossCheck service to identify academic plagiarism through the world’s leading plagiarism prevention tool, iParadigms, and to protect the original authors’ copyrights.
Computational Molecular Biology (online), 2024, Vol. 14 ISSN 1927-6648 http://hortherbpublisher.com/index.php/cmb © 2024 BioSc iPublisher, registered at the publishing platform that is operated by Sophia Publishing Group, founded in British Columbia of Canada. All Rights Reserved. Latest Content 2024, Vol. 14, No.2 【Research Report】 Modeling Biological Networks: A Systematic Review of Computational Approaches to Network Dynamics 45-53 Guoliang Chen, Minghua Li DOI: 10.5376/cmb.2024.14.0006 Genome-Wide Prediction and Selection in Plant and Animal Breeding: A Systematic Review of Current Techniques 54-63 Xiaoya Zhang, Jianquan Li DOI: 10.5376/cmb.2024.14.0007 【Feature Review】 Emerging Trends in Multi-Omics Data Integration: Challenges and Future Directions 64-75 JieZhang DOI: 10.5376/cmb.2024.14.0008 【Review and Progress】 The Application and Progress of Deep Learning in Bioinformatics 76-83 Haimei Wang DOI: 10.5376/cmb.2024.14.0009 Advances in Causal Inference Methods for Biological Network Analysis 84-94 Jiefu Lin, Kaiwen Liang DOI: 10.5376/cmb.2024.14.0010
Computational Molecular Biology 2024, Vol.14, No.2, 45-53 http://bioscipublisher.com/index.php/cmb 45 Research Report Open Access Modeling Biological Networks: Computational Approaches to Network Dynamics Guoliang Chen, Minghua Li Biotechnology Research Center, Cuixi Academy of Biotechnology, Zhuji, 311800, Zhejiang, China Corresponding author: minghua li@cuixi.org Computational Molecular Biology, 2024, Vol.14, No.2 doi: 10.5376/cmb.2024.14.0006 Received: 28 Jan., 2024 Accepted: 11 Mar., 2024 Published: 29 Mar., 2024 Copyright © 2024 Chen and Li, This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Preferred citation for this article: Chen G.L., and Li M.H., 2024, Modeling biological networks: computational approaches to network dynamics, Computational Molecular Biology, 14(2): 45-53 (doi: 10.5376/cmb.2024.14.0006) Abstract Biological networks are important tools for understanding the complexity and functionality of biological systems, and their dynamic analysis can reveal the dynamic behavior of biological processes. However, the high complexity and diversity of biological networks pose urgent challenges for research, requiring the development and application of advanced computational methods. This study reviews the different types of biological networks and their functional roles in biology, and explores in detail network dynamics calculation methods including graph theory, agent-based modeling, differential equations, etc. In addition, we also focus on dynamic modeling of gene regulatory networks, protein-protein interaction networks, and metabolic networks, analyzing the applications and limitations of these methods in practical biological systems. In order to provide a comprehensive reference for researchers in the field of biological network dynamics. Keywords Biological networks; Network dynamics; Computational methods; Gene regulatory networks; Protein-protein interaction networks 1 Introduction Biological networks, encompassing gene regulatory networks (GRNs), protein-protein interaction networks, and metabolic pathways, are fundamental to understanding the complex interactions that govern cellular processes. These networks are integral to various biological functions, including cell differentiation, metabolism, and signal transduction (Karlebach and Shamir, 2008; Wang and Gao, 2010). The advent of high-throughput technologies and computational methods has enabled the detailed mapping and analysis of these networks, providing insights into their structure and function (Covert et al., 2004; Glass et al., 2013). The integration of experimental data with computational models has become essential for elucidating the intricate dynamics of biological systems (Mangan et al., 2016; Manipur et al., 2020). Understanding the dynamics of biological networks is crucial for several reasons. Firstly, it allows researchers to predict the behavior of these networks under different conditions, which is vital for identifying the mechanisms underlying diseases caused by dysregulated cellular processes (Liu et al., 2020; Jolly and Roy, 2022). Secondly, dynamic models can facilitate the development of biotechnological applications by providing faster and more cost-effective alternatives to experimental approaches (Liu et al., 2020). Moreover, the study of network dynamics can reveal emergent properties and interactions that are not apparent from static network analyses, thereby offering a more comprehensive understanding of biological systems (Boccaletti et al., 2006; Paulevé et al., 2020). The application of control theory and other mathematical frameworks to these networks has further enhanced our ability to analyze and manipulate their behavior (Jolly and Roy, 2022). This study attempts to emphasize the methods used, the challenges encountered, and the progress made in this field. Specifically studying various modeling techniques, including Boolean modeling, differential equations, and data-driven methods, as well as their applications in understanding gene regulatory networks, metabolic pathways, and other biological systems. We will discuss the integration of high-throughput data and computational models, as well as the impact of these methods on future research and biotechnology innovation. We hope to provide valuable resources for researchers and practitioners interested in dynamic modeling of biological networks.
Computational Molecular Biology 2024, Vol.14, No.2, 45-53 http://bioscipublisher.com/index.php/cmb 46 2 Overview of Biological Networks Biological networks are intricate systems that represent the interactions among various biological entities, such as genes, proteins, and metabolites. These networks are essential for understanding the complex relationships and dynamics within biological systems. Advances in network science and high-throughput biomedical technologies have significantly enhanced our ability to study these networks, providing deeper insights into their structure and function (Bocci et al., 2023). 2.1 Types of biological networks Biological networks can be categorized into several types based on the nature of the interactions they represent. Common types include genetic regulatory networks, protein-protein interaction networks, metabolic networks, and signaling networks. Genetic regulatory networks depict the interactions between genes and their regulatory elements, while protein-protein interaction networks illustrate the physical interactions between proteins. Metabolic networks map the biochemical reactions within a cell, and signaling networks represent the pathways through which cells respond to external stimuli (Koutrouli et al., 2020; Jolly and Roy, 2022). Each type of network provides a unique perspective on the biological processes and helps in understanding the underlying mechanisms of cellular functions. 2.2 Structural properties of networks The structural properties of biological networks are crucial for understanding their behavior and functionality. Key properties include network topology, degree distribution, clustering coefficient, and path length. Network topology refers to the overall arrangement of nodes and edges, which can be characterized by patterns such as scale-free or small-world structures. Degree distribution describes the number of connections each node has, often following a power-law distribution in biological networks. The clustering coefficient measures the tendency of nodes to form tightly knit groups, while path length indicates the average number of steps required to traverse the network (Koutrouli et al., 2020; Paulevé et al., 2020). These properties help in identifying critical nodes and understanding the robustness and efficiency of biological networks. 2.3 Functional roles of networks in biology Biological networks play vital roles in various biological processes and functions. They are involved in cellular communication, metabolic regulation, and the coordination of complex biological responses. For instance, genetic regulatory networks control gene expression patterns, which are essential for cellular differentiation and development. Protein-protein interaction networks facilitate the formation of protein complexes that carry out specific cellular functions. Metabolic networks ensure the efficient flow of metabolites through biochemical pathways, supporting cellular energy production and biosynthesis. Signaling networks enable cells to perceive and respond to environmental changes, maintaining homeostasis and facilitating adaptation (Mangan et al., 2017). Understanding these functional roles is critical for deciphering the complexities of biological systems and developing therapeutic strategies for diseases. 3 Computational Approaches to Network Dynamics 3.1 Graph-theoretical methods Graph-theoretical methods are pivotal in analyzing biological networks due to their ability to represent complex systems as interconnected nodes and edges. These methods facilitate the understanding of the structural properties and functional dynamics of biological systems. For instance, graph theory can be used to analyze molecular structures in microbiology, where cells, genes, or proteins are represented as vertices, and their interactions as edges. This approach allows for the computation of topological indices, which can reveal significant biological activities and properties (Pavlopoulos et al., 2011; Gao et al., 2017). Additionally, graph-based methods can characterize global and local structural properties of cellular networks, detect motifs or clusters involved in common biological functions, and integrate large-scale experimental data for comprehensive network inference (Aittokallio and Schwikowski, 2006).
Computational Molecular Biology 2024, Vol.14, No.2, 45-53 http://bioscipublisher.com/index.php/cmb 47 3.2 Agent-based modeling 3.2.1 Principles and applications Agent-based modeling (ABM) is a flexible computational approach used to simulate the interactions of individual agents within a system, capturing the emergent behavior of complex biological networks. ABMs are particularly useful in fields ranging from molecular biology to ecology, where they can model phenomena such as cell migration, molecular dynamics, and disease spread (Hinkelmann et al., 2010; Nardini et al., 2020). These models are typically specified through protocols like the ODD protocol, which standardizes model descriptions and facilitates their analysis (Grob et al., 2019). 3.2.2 Strengths and limitations The strengths of ABM include its ability to model heterogeneous agents and capture stochastic behaviors, making it suitable for simulating real-world biological systems. However, ABMs often require extensive computational resources due to their complexity and the need for numerous simulations to explore parameter spaces. This computational demand can be mitigated by using neural networks to emulate ABMs, significantly improving efficiency while maintaining accuracy (Wang et al., 2019). Despite these advancements, challenges remain in accurately predicting model dynamics in certain parameter regimes, which can sometimes be addressed by integrating differential equation models learned from ABM simulations (Nardini et al., 2020). 3.2.3 Case studies in biological systems Several case studies highlight the application of ABM in biological systems. For example, ABMs have been used to model cell biology experiments, such as birth-death-migration processes, and epidemiological models like the susceptible-infected-recovered (SIR) model. These studies demonstrate the utility of ABM in predicting system dynamics and exploring biological phenomena. Additionally, the integration of ABM with other computational frameworks, such as equation learning, has shown promise in enhancing the predictive power and applicability of these models in various biological contexts. 3.3 Differential equation-based approaches Differential equation-based approaches are fundamental in modeling the dynamic behavior of biological networks. These methods use mathematical equations to describe the rate of change of system variables over time, providing insights into the underlying mechanisms of biological processes. For instance, control-theoretic approaches using differential equations have been applied to drug delivery systems, while other methods have been used to infer biochemical network dynamics and predict system behavior under different conditions (Mochizuki, 2016). Additionally, multi-scale probabilistic models, such as ProbRules, combine differential equations with logical rules to represent network dynamics across different scales, offering robust predictions of gene expression and molecular interactions (Grob et al., 2019). These approaches are crucial for understanding the complex interactions within biological networks and developing effective interventions. 4 Dynamic Modeling of Gene Regulatory Networks 4.1 Boolean networks 4.1.1 Basic concepts and applications Boolean networks are a fundamental approach to modeling gene regulatory networks (GRNs) due to their simplicity and intuitive nature. They represent genes as nodes and regulatory interactions as edges, with each gene being in one of two states: active or inactive. This binary representation allows for the construction of dynamic models that can predict the behavior of genetic networks under various conditions. Boolean networks are particularly useful for understanding the overall structure and dynamics of GRNs, making them a popular choice for initial modeling efforts (Saadat and Albert, 2013; Tyson et al., 2019). 4.1.2 Modeling gene regulation The process of modeling gene regulation using Boolean networks involves several key steps. First, experimental data is used to infer the network structure, identifying which genes regulate which others. This is followed by the application of graph-theoretical measures to analyze the network's properties. The network is then converted into a
Computational Molecular Biology 2024, Vol.14, No.2, 45-53 http://bioscipublisher.com/index.php/cmb 48 dynamic model that can simulate the behavior of the system over time. This approach allows researchers to make predictions about gene expression patterns and identify potential targets for therapeutic intervention (Murrugarra and Aguilar, 2019). 4.1.3 Examples in genetic networks Boolean networks have been successfully applied to various genetic networks, providing insights into complex biological processes. For instance, the segment polarity gene network in Drosophila melanogaster has been modeled using Boolean networks to understand the regulatory mechanisms involved in embryonic development (Saadat and Albert, 2013). Additionally, Boolean networks have been used to study cell differentiation and functional states, highlighting their utility in capturing the dynamic behavior of GRNs. These models have also been extended to incorporate stochastic elements, allowing for the simulation of gene expression variability observed in biological systems (Murrugarra and Aguilar, 2019). 4.2 Bayesian networks Bayesian networks offer a probabilistic approach to modeling gene regulatory networks, capturing the inherent uncertainty and variability in gene expression. These models use conditional probabilities to represent the relationships between genes, allowing for the integration of diverse data types and the inference of regulatory interactions. Bayesian networks are particularly useful for identifying causal relationships and predicting the effects of perturbations in the network (Grob et al., 2019). 4.3 Stochastic models Stochastic models are essential for capturing the random nature of gene regulatory processes, which arise from the small number of molecules involved and the stochasticity of their interactions. These models use mathematical frameworks such as the chemical master equation and the stochastic simulation algorithm (SSA) to simulate the behavior of GRNs under different conditions. Stochastic models provide a more accurate representation of gene expression dynamics, accounting for the noise and variability observed in experimental data (Liang and Han, 2012; Murrugarra and Aguilar, 2019). They are particularly useful for studying systems with significant molecular noise and for developing therapeutic strategies that target specific regulatory pathways. 5 Modeling Protein-Protein Interaction Networks 5.1 Structural and functional analysis Protein-protein interaction (PPI) networks are fundamental to understanding cellular processes and biological functions. Structural and functional analysis of these networks involves deciphering the atomic details of protein binding interfaces and their dynamic interactions within the cellular environment. Computational models, such as the multiscale framework integrating high-resolution structural information and simplified representations for long-time-scale dynamics, have proven effective in simulating these interactions and unraveling their complexities (Wang et al., 2018). Additionally, network-based modeling and coevolutionary analysis have enriched our understanding of protein dynamics and allosteric regulation, providing insights into the molecular mechanisms underlying protein functions and interactions (Liang et al., 2020). 5.2 Dynamic simulations Dynamic simulations, particularly molecular dynamics (MD) simulations, play a crucial role in studying the behavior of proteins and their interactions over time. These simulations capture the full atomic detail and temporal resolution of biomolecular processes, offering valuable insights into protein dynamics, structure-function relationships, and interaction mechanisms (Hollingsworth and Dror, 2018). Enhanced sampling MD approaches, combined with regular MD methods, assist in steering structure-based drug discovery by elucidating drug-protein interactions and binding mechanisms (Kalyaanamoorthy and Chen, 2014). Tools like SenseNet further analyze protein structure networks from MD simulations, predicting allosteric residues and their roles in signal transduction (Schneider and Antes, 2021). 5.3 Applications in drug discovery The application of computational approaches to PPI networks has significant implications for drug discovery. MD
Computational Molecular Biology 2024, Vol.14, No.2, 45-53 http://bioscipublisher.com/index.php/cmb 49 simulations have been widely used to investigate pathogenic mechanisms, virtual screening, and drug resistance mechanisms, providing essential information that guides the drug discovery and design process (Liu et al., 2018). Deep learning methods, such as graph neural networks (GNNs), have also emerged as powerful tools for predicting protein functions and interactions (Figure 1), facilitating in silico drug discovery and development (Muzio et al., 2020). These computational methods enable the identification of candidate disease genes or drug targets, which can be further validated experimentally, thus accelerating the drug discovery pipeline (Liang and Kelemen, 2018). Figure 1 On the GCN layer of the k-layer GCN (Aodpted from Muzio et al., 2020) Image caption: Each layer of the GCN is aggregated on each node's neighborhood using the node representation of the previous layer in the network. The aggregates in each layer then pass through an activation function (in this case, ReLU) before moving on to the next layer. The network can be used to generate a variety of different outputs: to predict new edges in the input network (link prediction), to classify individual nodes in the input graph (node classification), or to classify the entire input graph (graph classification)(Aodpted from Muzio et al., 2020) 6 Metabolic Network Modeling 6.1 Flux balance analysis Flux Balance Analysis (FBA) is a widely used computational method for predicting the flow of metabolites through a metabolic network. It relies on the principle of mass conservation and uses a stoichiometric matrix along with a biologically relevant objective function, such as biomass production or ATP generation, to identify optimal reaction flux distributions (Vidal-Limon et al., 2022). FBA has been instrumental in analyzing genome-scale reconstructions of various organisms and has applications in metabolic engineering and drug target identification (Sen, 2022). However, FBA has limitations, such as its inability to predict intracellular fluxes under all environmental conditions, necessitating the development of alternative strategies (Megchelenbrink et al., 2015). 6.2 Constraint-based optimization Constraint-based optimization methods extend the capabilities of FBA by incorporating additional constraints, such as kinetic, thermodynamic, and regulatory constraints, to improve the accuracy of metabolic flux predictions (Pandey et al., 2018; Sen et al., 2022). These methods allow for a more detailed and realistic representation of metabolic networks, enabling the analysis of complex cellular behaviors and the identification of key metabolic bottlenecks. For instance, the Maximum Metabolic Flexibility (MMF) method utilizes the observation that microorganisms often favor a suboptimal growth rate to maintain metabolic flexibility, thereby improving the quantitative predictions made by FBA.
Computational Molecular Biology 2024, Vol.14, No.2, 45-53 http://bioscipublisher.com/index.php/cmb 50 6.3 Integration with omics data The integration of omics data, such as transcriptomics, proteomics, and metabolomics, into metabolic network models has significantly enhanced their predictive capabilities. High-throughput technologies have generated vast amounts of omics data, which can be used to refine and constrain metabolic models, leading to more accurate predictions of cellular phenotypes (Blazier and Papin, 2012; Wang et al., 2021). Several methods have been developed to incorporate omics data into FBA, such as the Relative Expression and Metabolomic Integrations (REMI) method, which integrates gene expression and metabolomic data with thermodynamic constraints to provide more robust and biologically relevant results (Figure 2) (Pandey et al., 2018). These integrated models are valuable for understanding the dynamic adaptation of biochemical reaction fluxes and for exploring the interplay between metabolism and regulation in various physiological states (Wang et al., 2021). Figure 2 A genome-scale flux balance analysis (FBA) model and sets of gene-expression and/or metabolomic data In the pre-processing step, the FBA model is converted into a thermodynamic-based flux analysis (TFA) formulation, and the relative flux ratios are further assessed based on the omics data. Also based on the omics data provided, REMI translates to the REMI-TGex, REMI-TM, and REMI-TGexM methods (third block). Examples of gene-expression and metabolomic data (second block) together with a toy mode (third block) are used to illustrate the applicability of the REMI methods. The theoretical maximum consistency score (TMCS) is the number of available omics data (for metabolites, genes (reactions), or both) and the maximum consistency score (MCS) is the number of those constraints that are consistent with fluxes and could be integrated into REMI models. The MCS is always equal to or smaller than the TMCS. 7 Challenges and Future Directions 7.1 Scalability and complexity One of the primary challenges in modeling biological networks is managing the scalability and complexity of these systems. Biological networks often involve numerous components and interactions, making it difficult to create models that are both comprehensive and computationally feasible. For instance, the integration of various omics data (proteomics, genomics, lipidomics, and metabolomics) has led to large inventories of biological entities, but understanding how these entities interact remains a significant challenge (Kholodenko et al., 2012). Additionally, traditional methods such as Boolean networks and differential equations face limitations when applied to complex signal transduction networks due to their inability to handle the spatial and temporal dynamics
Computational Molecular Biology 2024, Vol.14, No.2, 45-53 http://bioscipublisher.com/index.php/cmb 51 effectively (Lee et al., 2020). New approaches like Most Permissive Boolean Networks (MPBNs) have been proposed to reduce the complexity of dynamical analysis, enabling the modeling of genome-scale networks (Paulevé et al., 2020). 7.2 Data integration and interoperability The integration of heterogeneous data types is another major challenge. Advances in high-throughput techniques have generated vast amounts of diverse omics data, which need to be integrated to provide a holistic view of biological systems. However, the complexity, heterogeneity, and high-dimensionality of these data pose significant challenges for data integration and interoperability (Lee et al., 2020). Methods for collective mining of various types of networked biological data have been proposed, but they still face limitations in dealing with heterogeneous networked data (Gligorijević and Przulj, 2015). The development of heterogeneous multi-layered networks (HMLNs) has shown promise in integrating diverse biological data, but new computational challenges arise in establishing causal genotype-phenotype associations and understanding environmental impacts on organisms (Wang et al., 2021). 7.3 Advances in computational techniques To address the challenges of scalability, complexity, and data integration, advances in computational techniques are essential. Probabilistic models like ProbRules, which combine probabilities and logical rules, have been developed to represent the dynamics of biological systems across multiple scales (Grob et al., 2019). These models have shown robustness in predicting gene expression readouts and clarifying molecular mechanisms. Additionally, non-negative matrix factorization-based approaches have been highlighted for their potential in dealing with heterogeneous data and providing accurate integrative analyses (Pham et al., 2008). The application of machine learning methods to network biology has also been emphasized, offering new biological insights and aiding in the development of more accurate in silico representations of biological systems (Liu et al., 2020). Acknowledgments We would like to thank Ms Kim for reading the manuscript and providing valuable feedback that improved the clarity of the text. We also appreciate two anonymous peer reviewers who contributed to the evaluation of this manuscript. Conflict of Interest Disclosure The authors affirm that this research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest. References Aittokallio T., and Schwikowski B., 2006, Graph-based methods for analysing networks in cell biology, Briefings in Bioinformatics, 7(3): 243-255. https://doi.org/10.1093/BIB/BBL022 Blazier A.S., and Papin J.A., 2012, Integration of expression data in genome-scale metabolic network reconstructions, Frontiers in Physiology, 3: 299. https://doi.org/10.3389/fphys.2012.00299 Boccaletti S., Latora V., Moreno Y., Chavez M., and Hwang D., 2006, Complex networks: structure and dynamics, Physics Reports, 424: 175-308. https://doi.org/10.1016/J.PHYSREP.2005.10.009 Bocci F., Jia D., Nie Q., Jolly M.K., and Onuchic J., 2023, Theoretical and computational tools to model multistable gene regulatory networks, Reports on, Progress in Physics, 2023: 86. https://doi.org/10.1088/1361-6633/acec88. Covert M., Knight E., Reed J., Herrgård M., and Palsson B., 2004, Integrating high-throughput and computational data elucidates bacterial networks, Nature, 429: 92-96. https://doi.org/10.1038/nature02456. Gao W., Wu H., Siddiqui M., and Baig A., 2017, Study of biological networks using graph theory, Saudi Journal of Biological Sciences, 25: 1212-1219. https://doi.org/10.1016/j.sjbs.2017.11.022. Glass K., Huttenhower C., Quackenbush J., and Yuan G., 2013, Passing messages between biological networks to refine predicted interactions, PLoS ONE, 8(5): e64832. https://doi.org/10.1371/journal.pone.0064832. Gligorijević V., and Przulj N., 2015, Methods for biological data integration: perspectives and challenges, Journal of The Royal Society Interface, 12(112): 20150571. https://doi.org/10.1098/rsif.2015.0571.
Computational Molecular Biology 2024, Vol.14, No.2, 45-53 http://bioscipublisher.com/index.php/cmb 52 Grob A., Kracher B., Kraus J.M., Kühlwein S.D., Pfister A.S., Wiese S., Luckert K., Pötz O., Joos T., Daele D., Raedt L., Kühl M., and Kestler H., 2019, Representing dynamic biological networks with multi-scale probabilistic models, Communications Biology, 2(1): 21. https://doi.org/10.1038/s42003-018-0268-3. Hinkelmann F., Murrugarra D., Jarrah A., and Laubenbacher R., 2010, A mathematical framework for agent based models of complex biological networks, Bulletin of Mathematical Biology, 73: 1583-1602. https://doi.org/10.1007/S11538-010-9582-8. Hollingsworth S., and Dror R., 2018, Molecular dynamics simulation for all, Neuron, 99: 1129-1143. https://doi.org/10.1016/j.neuron.2018.08.011. Jolly M.K., and Roy S., 2022, Editorial: topical collection on emergent dynamics of biological networks, Journal of Biosciences, 47(4): 82. Kalyaanamoorthy S., and Chen Y., 2014, Modelling and enhanced molecular dynamics to steer structure-based drug discovery, Progress in Biophysics and Molecular Biology, 114(3): 123-136. https://doi.org/10.1016/j.pbiomolbio.2013.06.004. Karlebach G., and Shamir R., 2008, Modelling and analysis of gene regulatory networks, Nature Reviews Molecular Cell Biology, 9: 770-780. https://doi.org/10.1038/nrm2503. Kholodenko B., Yaffe M.B., and Kolch W., 2012, Computational approaches for analyzing information flow in biological networks, Science Signaling, 5(220): re1. https://doi.org/10.1126/scisignal.2002961. Koutrouli M., Karatzas E., Páez-Espino D., and Pavlopoulos G.A., 2020, A guide to conquer the biological network era using graph theory, Frontiers in Bioengineering and Biotechnology, 8: 34. https://doi.org/10.3389/fbioe.2020.00034. Lee B., Zhang S., Poleksic A., and Xie L., 2020, Heterogeneous multi-layered network model for omics data integration and analysis, Frontiers in Genetics, 10. https://doi.org/10.3389/fgene.2019.01381. Liang J., and Han J., 2012, Stochastic boolean networks: an efficient approach to modeling gene regulatory networks, BMC Systems Biology, 6: 1-21. https://doi.org/10.1186/1752-0509-6-113. Liang Y., and Kelemen A., 2018, Dynamic modeling and network approaches for omics time course data: overview of computational approaches and applications, Briefings in Bioinformatics, 19(5): 1051-1068. https://doi.org/10.1093/bib/bbx036. Liang Z., Verkhivker G., and Hu G., 2020, Integration of network models and evolutionary analysis into high-throughput modeling of protein dynamics and allosteric regulation: theory tools and applications, Briefings in Bioinformatics, 21(3): 815-835. https://doi.org/10.1093/bib/bbz029. Liu C., Ma Y., Zhao J., Nussinov R., Zhang Y., Cheng F., and Zhang Z., 2020, Computational network biology: data models and applications, Physics Reports, 846: 1-66. https://doi.org/10.1016/j.physrep.2019.12.004. Liu X.W., Shi D.F., Zhou S.Y., Liu H.L., Liu H.X., and Yao X.J., 2018, Molecular dynamics simulations and novel drug discovery, Expert Opinion on Drug Discovery, 13(1): 23-37. https://doi.org/10.1080/17460441.2018.1403419. Mangan N., Brunton S., Proctor J., and Kutz J., 2016, Inferring biological networks by sparse identification of nonlinear dynamics, IEEE Transactions on Molecular Biological and Multi-Scale Communications, 2: 52-63. https://doi.org/10.1109/TMBMC.2016.2633265. Manipur I., Granata I., Maddalena L., and Guarracino M.R., 2020, Clustering analysis of tumor metabolic networks, BMC Bioinformatics, 21: 1-14. https://doi.org/10.1186/s12859-020-03564-9. Megchelenbrink W., Rossell S., Huynen M.A., Notebaart R.A., and Marchiori E., 2015, Estimating metabolic fluxes using a maximum network flexibility paradigm, PLoS ONE, 10(10): e0139665. https://doi.org/10.1371/journal.pone.0139665. Mochizuki A., 2016, Theoretical approaches for the dynamics of complex biological systems from information of networks, Proceedings of the Japan Academy, Series B Physical and Biological Sciences, 92: 255-264. https://doi.org/10.2183/pjab.92.255. Murrugarra D., and Aguilar B., 2019, Modeling the stochastic nature of gene regulation with boolean networks, Algebraic and Combinatorial Computational Biology, 2019: 147-173. https://doi.org/10.1016/B978-0-12-814066-6.00005-2. Muzio G., O’Bray L., and Borgwardt K., 2020, Biological network analysis with deep learning, Briefings in Bioinformatics, 22: 1515-1530. https://doi.org/10.1093/bib/bbaa257. Nardini J.T., Baker R.E., Simpson M.E., and Flores K.B., 2020, Learning differential equation models from stochastic agent-based model simulations, Journal of the Royal Society Interface, 18(176): 20200987. https://doi.org/10.1098/rsif.2020.0987. Pandey V., Hadadi N., and Hatzimanikatis V., 2018, Enhanced flux prediction by integrating relative expression and relative metabolite abundance into thermodynamically consistent metabolic models, PLoS Computational Biology, 15(5): e1007036. https://doi.org/10.1371/journal.pcbi.1007036.
Computational Molecular Biology 2024, Vol.14, No.2, 45-53 http://bioscipublisher.com/index.php/cmb 53 Paulevé L., Kolcák J., Chatain T., and Haar S., 2020, Reconciling qualitative abstract and scalable modeling of biological networks, Nature Communications, 11(1): 4256. https://doi.org/10.1038/s41467-020-18112-5. Pavlopoulos G.A., Secrier M., Moschopoulos C.N., Soldatos T., Kossida S., Aerts J., Schneider R., and Bagos P., 2011, Using graph theory to analyze biological networks, BioData Mining, 4: 1-27. https://doi.org/10.1186/1756-0381-4-10. Pham E., Li I., and Truong K., 2008, Computational modeling approaches for studying of synthetic biological networks, Current Bioinformatics, 3: 130-141. https://doi.org/10.2174/157489308784340667. Saadat pour A., and Albert R., 2013, Boolean modeling of biological regulatory networks: a methodology tutorial, Methods, 62(1): 3-12. https://doi.org/10.1016/j.ymeth.2012.10.012. Schneider M., and Antes I., 2021, SenseNet a tool for analysis of protein structure networks obtained from molecular dynamics simulations, PLoS ONE, 17(3): e0265194. https://doi.org/10.1371/journal.pone.0265194. Sen P., 2022, Flux balance analysis of metabolic networks for efficient engineering of microbial cell factories, Biotechnology and Genetic Engineering Reviews, 2022: 1-34. https://doi.org/10.1080/02648725.2022.2152631. Tyson J., Laomettachit T., and Kraikivski P., 2019, Modeling the dynamic behavior of biochemical regulatory networks, Journal of Theoretical Biology, 462: 514-527. https://doi.org/10.1016/j.jtbi.2018.11.034. Vidal-Limon A., Aguilar-Toalá J., and Liceaga A., 2022, Integration of molecular docking analysis and molecular dynamics simulations for studying food proteins and bioactive peptides, Journal of Agricultural and Food Chemistry, 70(4): 934-943. https://doi.org/10.1021/acs.jafc.1c06110. Wang Y.X.R., Li L., Li J.J., 2021, Network modeling in biology: statistical methods for gene and brain networks, Statistical Science: A Review Journal of the Institute of Mathematical Statistics, 36(1): 89. https://doi.org/10.1214/20-sts792. Wang B., Xie Z., Chen J., and Wu Y., 2018, Integrating structural information to study the dynamics of protein-protein interactions in cells, Structure, 26(10):1414-1424. https://doi.org/10.1016/j.str.2018.07.010. Wang S., Fan K., Luo N., Cao Y., Wu F., Zhang C., Heller K., and You L., 2019, Massive computational acceleration by using neural networks to emulate mechanism-based biological models, Nature Communications, 10(1): 4354. https://doi.org/10.1038/s41467-019-12342-y. Wang X., Zhang Y., and Wen T., 2021, Progress on genome-scale metabolic models integrated with multi-omics data, Chinese Science Bulletin, 13(7): 855. https://doi.org/10.1360/tb-2020-1468. Wang Z., and Gao H., 2010, Dynamics analysis of gene regulatory networks, International Journal of Systems Science, 41: 1-4. https://doi.org/10.1080/00207720903477952.
Computational Molecular Biology 2024, Vol.14, No.2, 54-63 http://bioscipublisher.com/index.php/cmb 54 Research Report Open Access Genome-Wide Prediction and Selection in Plant and Animal Breeding: A Systematic Review of Current Techniques Xiaoya Zhang, Jianquan Li Hainan Key Laboratory of Crop Molecular Breeding, Sanya, 572025, China Corresponding author: jianquan.li @hitar.org Computational Molecular Biology, 2024, Vol.14, No.2 doi: 10.5376/cmb.2024.14.0007 Received: 03 Feb., 2024 Accepted: 14 Mar., 2024 Published: 01 Apr., 2024 Copyright © 2024 Zhang and Li, This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Preferred citation for this article: Zhang X.Y., and Li J.Q., 2024, Genome-wide prediction and selection in plant and animal breeding: a systematic review of current techniques, Computational Molecular Biology, 14(2): 54-63 (doi: 10.5376/cmb.2024.14.0007) Abstract With the advancement of genomics technology, whole genome prediction (GWP) and genome selection (GS) have become important tools in plant and animal breeding. Genomic selection utilizes whole genome marker information to select target traits through predictive models, improving breeding efficiency and accuracy. This study comprehensively reviews the application of whole genome prediction technology in plant and animal breeding, with a focus on exploring its role in improving breeding efficiency. Analyzing current genome selection models and methods, exploring the potential application of GS in improving important agronomic and economic traits, as well as its prospects in different fields. Research has shown that GS technology has greatly improved selection efficiency in multiple breeding projects, particularly in enhancing plant disease resistance and increasing crop yield. In animal breeding, genome selection has been widely applied to improve the reproductive traits, health, and productivity of livestock. Keywords Genomic selection; Plant breeding; Animal breeding; Machine learning; Genotype-environment interaction 1 Introduction Genome-wide prediction and selection (GWPS) have revolutionized the fields of plant and animal breeding by enabling the prediction of complex traits through the use of dense genomic markers. This approach involves the implementation of whole-genome regression (WGR) models, where phenotypes are regressed on thousands of markers concurrently, allowing for the accurate prediction of genetic values (Campos et al., 2013). The advent of high-throughput sequencing technologies has facilitated the capture of both additive and non-additive genetic effects, thereby enhancing the prediction of genetic gains from selection (He et al., 2023). Various statistical models, such as genomic best linear unbiased predictor (G-BLUP) and Bayesian least absolute shrinkage and selection operator (BLASSO), have been developed to address the high dimensionality and multicollinearity challenges inherent in GWPS (Lima et al., 2019a). Additionally, non-parametric methods like Delta-p have been proposed to further improve prediction accuracy (Lima et al., 2019b). The integration of GWPS into modern breeding programs has significantly increased the efficiency and speed of genetic evaluations, leading to higher genetic gains per unit of time (Alkimim et al., 2020). This is particularly crucial for perennial species, where traditional breeding cycles are lengthy. By leveraging genomic estimated breeding values (GEBVs), breeders can identify superior genotypes early in the breeding cycle, thus accelerating the selection process (Lima et al., 2019a). The application of GWPS has shown promising results in various crops, including cassava, Coffea canephora, and Asian rice, demonstrating its potential to enhance breeding outcomes across diverse species (Lima et al., 2019a; Lima et al., 2019b; Alkimim et al., 2020). Moreover, the use of deep learning models in GWPS has further improved prediction accuracy for complex traits, making it a valuable tool in large-scale breeding programs (Sandhu et al., 2021). This study provides a comprehensive overview of the current technologies and methods used in genome-wide prediction and selection (GWPS) in plant and animal breeding. It summarizes the various statistical models and methods employed in GWPS, including both parametric and non-parametric approaches, and evaluates their effectiveness and efficiency in different breeding programs and species. The study also discusses the challenges
Computational Molecular Biology 2024, Vol.14, No.2, 54-63 http://bioscipublisher.com/index.php/cmb 55 and limitations associated with GWPS, such as high dimensionality, multicollinearity, and genotype-environment interactions. Additionally, it highlights recent advancements and future directions in the field, including the integration of deep learning models and digital breeding technologies. 2 Overview of Genome-Wide Prediction Techniques 2.1 Genomic selection (GS) Genomic Selection (GS) has revolutionized the field of plant and animal breeding by enabling the rapid selection of superior genotypes and accelerating the breeding cycle. Unlike traditional marker-assisted selection, which focuses on identifying individual loci associated with traits, GS uses all marker data as predictors of performance, leading to more accurate predictions (Jannink et al., 2010; Crossa et al., 2017). This approach is particularly beneficial for complex traits controlled by many genes with small effects, which traditional methods struggle to address effectively (Meuwissen et al., 2016; Varshney et al., 2017). The integration of GS into breeding programs has shown tangible genetic gains, as evidenced by its application in maize breeding, where significant improvements have been observed (Crossa et al., 2017). The success of GS hinges on its ability to incorporate all marker information into the prediction model, thereby avoiding biased marker effect estimates and capturing more of the variation due to small-effect quantitative trait loci (QTL). This comprehensive approach allows for the prediction of breeding values of lines in a population by analyzing their phenotypes and high-density marker scores. The accuracy of these predictions has been demonstrated in both simulation and empirical studies, with correlations between true breeding value and genomic estimated breeding value reaching levels as high as 0.85 for polygenic low heritability traits (Varshney et al., 2017). This level of accuracy is sufficient to consider selecting for agronomic performance using marker information alone, substantially accelerating the breeding cycle and enhancing gains per unit time. 2.2 Genomic prediction models Genomic prediction models are central to the implementation of GS, as they estimate the effects of markers across the entire genome on the target population based on a prediction model developed in the training population. These models are designed to capture small QTL effects that are often ignored in traditional association analysis, thereby providing a more comprehensive understanding of the genetic architecture of complex traits (Desta and Ortiz, 2014). Various genomic prediction models have been proposed, each with its strengths and limitations. For instance, the Bayesian Lasso, weighted Bayesian shrinkage regression (wBSR), and random forest (RF) are among the models that have shown promise in terms of predictive accuracy and computational efficiency (Heslot et al., 2012). The choice of genomic prediction model can significantly impact the accuracy of predictions and the genetic gain from selection. Comparative studies have shown that while many models achieve similar levels of accuracy, they differ in their susceptibility to overfitting, computation time, and the distribution of marker effect estimates (Heslot et al., 2012). Additionally, the integration of multi-trait and multi-environment models, high-throughput phenotyping, and deep learning approaches can further enhance the accuracy and efficiency of genomic predictions (Merrick et al., 2022). These advancements highlight the importance of continuous research and optimization of genomic prediction models to maximize their potential in breeding programs. 2.3 Machine learning and artificial intelligence applications The application of machine learning (ML) and artificial intelligence (AI) in genomic prediction represents a significant advancement in the field of breeding. Machine learning methods, such as random forest and deep learning, have been shown to capture non-additive effects and improve the accuracy of genomic predictions. These methods can handle large datasets with complex interactions, making them well-suited for genomic prediction tasks. For example, random forest, a machine learning method, has been found to be effective in capturing non-additive effects, which are often missed by traditional linear models (Heslot et al., 2012). The integration of ML and AI into genomic prediction models offers several advantages, including the ability to analyze large and complex datasets, improve prediction accuracy, and reduce computation time. High-throughput
Computational Molecular Biology 2024, Vol.14, No.2, 54-63 http://bioscipublisher.com/index.php/cmb 56 phenotyping and deep learning approaches can leverage the large amount of genomic and phenotypic data collected across different growing seasons and environments to increase heritability estimates, selection intensity, and selection accuracy (Merrick et al., 2022). 3 Data Requirements and Management 3.1 High-throughput genotyping High-throughput genotyping is a cornerstone of modern plant and animal breeding programs, enabling the identification and utilization of genetic variation on a genome-wide scale. Single Nucleotide Polymorphisms (SNPs) are the most commonly used markers due to their abundance and the development of high-throughput genotyping technologies such as SNP arrays and whole-genome sequencing (WGS). SNP arrays, like the TaBW280K developed for wheat, allow for efficient genotyping of large populations, providing valuable data for diversity analyses and breeding programs (Rimbert et al., 2018). Similarly, genotyping-by-sequencing (GBS) has emerged as a cost-effective alternative, combining marker discovery and genotyping in a single step, which is particularly useful for species with large genomes (He et al., 2014; Gorjanc et al., 2015). The effectiveness of genomic selection (GS) is highly dependent on the density and coverage of genetic markers. High-density SNP arrays and WGS provide comprehensive coverage of the genome, capturing a wide range of genetic variation. For instance, the TaBW280K array for wheat includes 280,226 SNPs, covering both genic and intergenic regions, which enhances the resolution of genetic mapping and the accuracy of GS models (Rimbert et al., 2018). In livestock, GBS has been shown to provide comparable accuracy to SNP arrays when a sufficient number of markers and appropriate sequencing depth are used (Gorjanc et al., 2015). The choice between SNP arrays and WGS often depends on the specific requirements of the breeding program, including the species, genome size, and available resources (Bhat et al., 2016; Moraes et al., 2018). Cost and efficiency are critical factors in the selection of genotyping methods. SNP arrays, while having a high initial development cost, offer a cost-effective solution for routine genotyping once established. For example, the development of species-specific SNP arrays can be expensive, but they provide high-throughput and reliable genotyping for large breeding populations (Grattapaglia et al., 2011; Moraes et al., 2018). On the other hand, GBS and other NGS-based methods offer flexibility and lower initial costs, making them suitable for species where SNP arrays are not available or economically feasible (He et al., 2014; Gorjanc et al., 2015). The continuous decline in sequencing costs is expected to further enhance the feasibility of WGS for GS in the near future (Bhat et al., 2016). 3.2 Phenotypic data collection Accurate phenotypic data is essential for the success of GS. High-throughput phenotyping technologies are being developed to complement genotyping efforts, enabling the collection of large-scale, precise phenotypic data. These technologies include automated imaging systems, remote sensing, and various sensor-based methods that can capture complex traits in real-time. The integration of high-throughput phenotyping with genotyping data is crucial for improving the accuracy of genomic predictions and achieving significant genetic gains in breeding programs (Figure 1) (Bhat et al., 2016; Wang et al., 2016). Bhat et al. (2016) found that combining high-throughput phenotyping (HTP) with genomic estimated breeding values (GEBV) enables precise prediction of an individual’s breeding value, thereby accelerating the identification, testing, and promotion of superior genotypes. NGS and HTP technologies significantly enhance the efficiency and accuracy of genomic selection by increasing the coverage of genotype data and the precision of phenotype data collection, speeding up the breeding process for superior varieties. The application of these technologies reduces costs, optimizes breeding resources, and provides powerful tools for crop improvement. 3.3 Data integration and management The integration and management of large-scale genotypic and phenotypic data pose significant challenges. Effective data management systems are required to handle the vast amounts of data generated by high-throughput genotyping and phenotyping technologies. These systems must support data storage, retrieval, and analysis,
RkJQdWJsaXNoZXIy MjQ4ODYzNA==