Computational Molecular Biology 2024, Vol.14, No.4 http://bioscipublisher.com/index.php/cmb © 2024 BioSci Publisher, registered at the publishing platform that is operated by Sophia Publishing Group, founded in British Columbia of Canada. All Rights Reserved. Publisher
Computational Molecular Biology 2024, Vol.14, No.4 http://bioscipublisher.com/index.php/cmb © 2024 BioSci Publisher, registered at the publishing platform that is operated by Sophia Publishing Group, founded in British Columbia of Canada. All Rights Reserved. BioSci Publisher is an international Open Access publishing platform that publishes scientific journals in the field of bioscience registered at the publishing platform that is operated by Sophia Publishing Group (SPG), founded in British Columbia of Canada. BioSci Publisher Publisher BioSci Publisher Editedby Editorial Team of Computational Molecular Biology Email: edit@cmb.bioscipublisher.com Website: http://bioscipublisher.com/index.php/cmb Address: 11388 Stevenston Hwy, PO Box 96016, Richmond, V7A 5J5, British Columbia Canada Computational Molecular Biology (ISSN 1927-5587) is an open access, peer reviewed journal published online by BioSciPublisher. The Journal is publishing all the latest and outstanding research articles, letters, methods, and reviews in all areas of computational molecular biology, covering new discoveries in molecular biology, from genes to genomes, using statistical, mathematical, and computational methods as well as new development of computational methods and databases in molecular and genome biology. The papers published in the journal are expected to be of interests to computational scientists, biologists and teachers/students/researchers engaged in biology. All the articles published in Computational Molecular Biology are Open Access, and are distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. BioSciPublisher uses CrossCheck service to identify academic plagiarism through the world’s leading plagiarism prevention tool, iParadigms, and to protect the original authors’ copyrights.
Computational Molecular Biology (online), 2024, Vol. 14 ISSN 1927-6648 http://hortherbpublisher.com/index.php/cmb © 2024 BioSc iPublisher, registered at the publishing platform that is operated by Sophia Publishing Group, founded in British Columbia of Canada. All Rights Reserved. Latest Content 2024, Vol. 14, No.4 【Research Insight】 Dynamic Modeling in Systems Biology: From Pathway Analysis to Whole-Cell Simulations 134-144 Jiayao Zhou, Shudan Yan DOI: 10.5376/cmb.2024.14.0016 The Evolving Landscape of Genomic Selection: Insights and Innovations in Quantitative Genetics 145-154 Xiaojun Li, Shuiji Zhang DOI: 10.5376/cmb.2024.14.0017 【Review Article】 Big Data in Genomics: Overcoming Challenges Through High-Performance Computing 155-162 Liting Wang, Haimei Wang DOI: 10.5376/cmb.2024.14.0018 【Research Perspective】 Biostatistical Challenges in High-Dimensional Data Analysis: Strategies and Innovations 163-172 Jianjun Wang DOI: 10.5376/cmb.2024.14.0019 【Research and Progress】 Bioinformatics in the Age of Big Data: Leveraging Computational Tools for Biological Discoveries 173-181 Xiaoming Liu, Wei Zhang DOI: 10.5376/cmb.2024.14.0020
Computational Molecular Biology 2024, Vol.14, No.4, 134-144 http://bioscipublisher.com/index.php/cmb 134 Research Insight Open Access Dynamic Modeling in Systems Biology: From Pathway Analysis to Whole-Cell Simulations Jiayao Zhou , Shudan Yan Institute of Life Sciences, Jiyang Colloge of Zhejiang A&F University, Zhuji, 311800, Zhejiang, China Corresponding author: jiayao zhou@jicat.org Computational Molecular Biology, 2024, Vol.14, No.4 doi: 10.5376/cmb.2024.14.0016 Received: 16 May, 2024 Accepted: 22 Jun., 2024 Published: 08 Jul., 2024 Copyright © 2024 Zhou and Yan, This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Preferred citation for this article: Zhou J.Y., and Yan S.D., 2024, Dynamic modeling in systems biology: from pathway analysis to whole-cell simulations, Computational Molecular Biology, 14(4): 134-144 (doi: 10.5376/cmb.2024.14.0016) Abstract Systems biology is an important research field for understanding complex biological systems. By integrating various omics data and computational models, it reveals the interactions and dynamic behaviors of different biomolecules within the organism. Dynamic modeling, as a core tool in systems biology, helps researchers construct multi-scale biological system models through analysis of metabolic pathways, signal transduction pathways, etc., extending from the cellular level to whole cell simulations. This study is based on the latest research progress and explores the application of dynamic modeling in gene regulatory networks, drug discovery, personalized medicine, and synthetic biology, with a particular focus on the challenges and prospects of whole cell simulation. Dynamic modeling helps to enhance the understanding of biological systems and provides new solutions for fields such as personalized therapy and drug development. Future research will focus on how to address the challenges of data integration, model complexity, and computational power to drive further development in systems biology. Keywords Systems biology; Dynamic modeling; Metabolic pathways; Whole-cell simulations; Personalized medicine 1 Introduction Systems biology is an integrative discipline that connects molecular components across different biological scales, such as cells, tissues, and organs, to physiological functions, aiming to uncover the dynamic behavior of complex biological systems. This field combines quantitative reasoning, computational models, and high-throughput experimental techniques to understand the flow of information from genes to biological functions at various levels (Tavassoly et al., 2018). By synthesizing multidimensional data from cells and molecules, systems biology provides a crucial framework for generating hypotheses, guiding experimental design, and predicting mechanisms (Eddy et al., 2015). This approach not only deepens our understanding of biological complexity but also advances quantitative pharmacology and precision medicine (Tavassoly et al., 2018). Dynamic modeling, a core tool in systems biology, enables researchers to simulate and analyze the temporal changes of biological systems. These models often use mathematical techniques such as ordinary differential equations to describe changes in biochemical networks, offering insights into the dynamic behavior of signaling and gene regulatory networks (Linden et al., 2022). Despite challenges with limited and noisy experimental data, dynamic modeling provides valuable perspectives on the multi-scale dynamics of biological systems (Tavassoly et al., 2018; Linden et al., 2022). The use of integrated modeling and time-series simulation allows researchers to go beyond static snapshots and capture the dynamic responses of biological systems under different conditions (Musilová and Sedlář, 2021). These models not only contribute to basic research but also have broad applications in drug discovery, disease treatment, and synthetic biology. This study reviews existing methods of dynamic modeling and explores their applications and challenges in systems biology. We will analyze the advantages and limitations of these methods, and demonstrate the potential of dynamic modeling in interpreting complex biological systems through practical case studies. At the same time, future research directions will be proposed to further improve the predictive ability and application value of dynamic models.
Computational Molecular Biology 2024, Vol.14, No.4, 134-144 http://bioscipublisher.com/index.php/cmb 135 2 Pathway Analysis: Foundations and Techniques 2.1 Metabolic pathway models Metabolic pathway models are essential for understanding the biochemical processes within cells. These models often rely on high-throughput genomics and proteomics data to reconstruct genome-scale pathways. For instance, in yeast (Saccharomyces cerevisiae), bioinformatics methods have been pivotal in modeling metabolic pathways, enabling the investigation of interactions among molecules that lead to specific cellular processes (Hou et al., 2016). Dynamic modeling approaches, such as those reviewed in the context of metabolic engineering, incorporate detailed kinetic information to enhance the accuracy of phenotype predictions and optimize metabolic pathways (Kim et al., 2018). These models are crucial for applications in strain optimization and metabolic engineering, where the goal is to improve the production of desired compounds by microorganisms. 2.2 Signal transduction pathway models Signal transduction pathways involve complex networks of molecular interactions that transmit signals from the cell surface to the interior, resulting in specific cellular responses. Modeling these pathways is challenging due to the spatial and temporal dynamics involved. Probabilistic models like ProbRules combine probabilities and logical rules to represent the dynamics of signal transduction networks across multiple scales, as demonstrated in the Wnt signaling pathway (Figure 1) (Groß et al., 2019). Additionally, reaction-diffusion systems are used to model the spatio-temporal dynamics of signaling molecules, providing insights into the compartmentalization and microdomains within cells (Getz et al., 2018). Petri net approaches offer another method for modeling signal transduction pathways, especially when kinetic data is scarce, by using qualitative and semi-quantitative techniques to explore system dynamics (Koch and Büttner, 2023). Signal transduction pathway models describe how molecules transmit signals from the cell surface to the interior, triggering specific cellular responses. Figure 1 illustrates the core molecules and interactions in the Wnt signaling pathway, including the complex dynamics of components such as β-catenin, Axin, and GSK3. In the absence of Wnt signals, β-catenin is phosphorylated and degraded by the destruction complex, while Wnt signaling inhibits this process, leading to β-catenin accumulation and its translocation into the nucleus to initiate gene transcription. Since these processes involve multi-scale and time-dependent dynamics, modeling such pathways requires consideration of complex feedback mechanisms and stochastic events, as demonstrated by the molecular network dynamics in the figure. 2.3 Computational methods in pathway analysis Computational methods play a crucial role in pathway analysis by providing tools to model, simulate, and analyze biological pathways. Various approaches have been developed to address the challenges of modeling complex biological systems. For example, the mEPN framework offers a biologist-friendly pathway modeling language and a stochastic flow algorithm to simulate pathway dynamics, supported by a 3-D visualization engine (O’Hara et al., 2016). Simplifying assumptions are often necessary to make models tractable; however, these must be carefully chosen to avoid compromising the model's accuracy. An alternative approach to truncating pathway steps is to assume homogeneous rates of information propagation, which has been shown to produce more accurate models (Korsbo and Jönsson, 2020). Whole-cell models (WCMs) integrate diverse intracellular pathways using computational methods like stochastic simulation, which, despite being time-consuming, provide detailed insights into the system's dynamics (Yeom et al., 2021). These computational techniques are essential for advancing our understanding of biological pathways and their applications in systems biology. 3 Dynamic Modeling Approaches 3.1 Differential equation models 3.1.1 Ordinary differential equations (ODEs) in gene regulatory networks Ordinary Differential Equations (ODEs) are a fundamental tool in modeling the dynamics of gene regulatory networks (GRNs). These models are particularly useful for understanding the interactions and regulatory mechanisms at the molecular level. ODE models can be linear or nonlinear, and they are often employed to describe the rate of change in gene expression levels over time. For instance, a single-index ODE model has been proposed to explore dynamic interactions in gene regulatory networks, demonstrating its effectiveness in fitting experimental data and identifying network structures that might be missed by linear models (Zhang et al., 2018).
Computational Molecular Biology 2024, Vol.14, No.4, 134-144 http://bioscipublisher.com/index.php/cmb 136 Additionally, systems-biology-informed deep learning algorithms have been developed to incorporate ODEs into neural networks, enhancing the robustness of parameter inference and prediction of hidden dynamics in GRNs (Yazdani et al., 2019). These approaches underscore the versatility and power of ODEs in capturing the complex behavior of gene regulatory networks.
Computational Molecular Biology 2024, Vol.14, No.4, 134-144 http://bioscipublisher.com/index.php/cmb 137 Figure 1 Key components and interactions in the ProbRules Wnt model (Adopted from Groß et al., 2019) Image caption: a Without extracellular Wnt, β-catenin is phosphorylated by the destruction complex and proteasomally degradated. Extracellular Wnt inhibits the destruction complex and cytoplasmic β-catenin accumulates. Wnt-induced disheveled activates Rac (Ras-related C3 botulinum toxin substrate) which further activates JNK1 (c-Jun N-terminal kinase 1)/JNK2. Activated JNK2 allows β-catenin to translocate into the nucleus to induce in combination with LEF (lymphoid enhancer factor) transcription. In contrast, JNK1 activates GSK3-β (glycogen synthase kinase 3β). The pale orange arrow represents a so far unknown positive influence of Rac on β-catenin accumulation that was predicted by our study. The model comprises 46 molecules with 21 logical relations for the Wnt/β-catenin and 19 logical relations for Wnt/JNK branches which were represented using 69 interaction edges and 93 rules. For details please see Supplementary Information. b–d Analyses of LEF/β-catenin–DNA interaction dynamics robustness to parameter values. Global attack rate ranges from 0.06 to 0.9. Global decay rate equals to 1/3 (b), 1/2 (c), or 2/3 (d) of the respective global attack rate. Global attack rate controls the onset of transcriptional response. Global decay rate determines the overall level of the response. e Structural robustness analysis by systematic introduction of additional rules. Rules source from all 69 interactions in the ProbRules model plus a constantly active interaction, target 67 interactions (i.e., excluding the two inputs) and drive the targets towards either ‘on’or ‘off’, resulting in 70*67*2 = 9380 augmented models. The majority of the simulation results (89.1%) shows no or a negligible effect, around 5.93% show a moderate (Type 1A) to nearly total (Type 1B) decrease at the output, around 2.76% a moderate (Type 2A) to strong (Type 2B) increase and around 2.2% show phases of constant activation before stimulation (Type 3A) or during the complete simulation (Type 3B) (Adopted from Groß et al., 2019) 3.1.2 Partial differential equations (PDEs) in tissue and organ models Partial Differential Equations (PDEs) extend the capabilities of ODEs by incorporating spatial dimensions, making them suitable for modeling processes that vary across both time and space. PDEs are particularly valuable in tissue and organ modeling, where spatial heterogeneity plays a crucial role. For example, PDEs have been employed to simulate the dynamics of fibrosis, a process involving the proliferation and activation of fibroblasts and the deposition of extracellular matrix in tissues (Ragusa and Russo, 2016). These models help in understanding the spatial distribution of signaling molecules and cells, providing insights into the progression of diseases and the effects of potential treatments. Recent advancements in machine learning, such as physics-informed DeepONets, have further enhanced the ability to perform long-time simulations of PDE systems, offering stable and accurate predictions across a range of initial conditions (Wang and Perdikaris, 2021). 3.1.3 Applications in multi-scale modeling Multi-scale modeling integrates processes occurring at different scales, from molecular to cellular to tissue levels, providing a comprehensive understanding of biological systems. ODEs and PDEs are often combined in multi-scale models to capture the interactions between different levels of organization. For instance, in the context of fibrosis, ODEs are used to model the dynamics of cell proliferation and signaling pathways, while PDEs describe the spatial distribution of cells and extracellular matrix within the tissue (Ragusa and Russo, 2016). This approach allows for a more detailed and accurate representation of complex biological processes, facilitating the development of targeted therapies and interventions. 3.2 Stochastic models for biological systems Stochastic models account for the inherent randomness and noise in biological systems, which are often not captured by deterministic models like ODEs and PDEs. These models are particularly useful for studying systems with small numbers of molecules or cells, where stochastic effects can significantly influence the system's behavior. For example, stochastic differential equations (SDEs) have been employed to model gene regulatory networks, providing insights into the variability and robustness of gene expression (Yang and Chen, 202). The use of stochastic models complements deterministic approaches, offering a more complete picture of the dynamics and variability in biological systems. 3.3 Rule-based modeling Rule-based modeling is an approach that focuses on the interactions and rules governing the behavior of individual components in a biological system. This method is particularly useful for systems with a large number of interacting components, where traditional differential equation models become impractical. Rule-based models can capture the combinatorial complexity of molecular interactions, such as those in signaling pathways and gene
Computational Molecular Biology 2024, Vol.14, No.4, 134-144 http://bioscipublisher.com/index.php/cmb 138 regulatory networks. By specifying rules for interactions, these models can simulate the system's behavior and predict the outcomes of perturbations. This approach has been successfully applied to various biological systems, providing valuable insights into their regulatory mechanisms and potential therapeutic targets. In summary, dynamic modeling approaches, including differential equation models, stochastic models, and rule-based modeling, offer powerful tools for understanding the complex behavior of biological systems. Each approach has its strengths and limitations, and their combined use can provide a more comprehensive understanding of the underlying biological processes. 4 Whole-Cell Simulations: Moving Toward Comprehensive Models 4.1 Challenges in whole-cell modeling Whole-cell modeling represents a significant challenge in computational systems biology due to the complexity and scale of the task. One of the primary obstacles is the inference of parameters and the selection among competing models, which requires reliable construction methods and efficient computational techniques (Stumpf et al., 2021). The development and curation of these models are labor-intensive, and only a few comprehensive models have been developed to date (Georgouli et al., 2023). Additionally, the integration of diverse intracellular pathways through various computational methods, such as stochastic simulation, is time-consuming and computationally expensive (Yeom et al., 2021). The need for high-performance computing platforms to run these models efficiently is another critical challenge (Georgouli et al., 2023). 4.2 Integrating omics data The integration of omics data is crucial for the development of accurate and comprehensive whole-cell models. Omic technologies, such as genomics, transcriptomics, proteomics, and metabolomics, provide a complete readout of the molecular state of a cell at different biological scales. Genome-scale models (GEMs) have been used to interpret and integrate multi-omic data, converting biological reactions into mathematical formulations that can be modeled using optimization principles (Dahal et al., 2020). The integration of omics data with whole-cell models requires appropriate computational methods and data-sharing practices to ensure the accuracy and completeness of the models. Recent advancements in measurement technology and bioinformatics have facilitated the integration of omics data, but challenges remain in developing scalable and comprehensive models (Goldberg et al., 2017). 4.3 Case studies of whole-cell simulations Several case studies highlight the progress and potential of whole-cell simulations. For instance, a high-performance whole-cell simulation of Escherichia coli (E. coli) was developed using modular cell biology principles and a Brownian dynamics-based parallel simulation framework (Das and Mitra, 2021). This approach involved dividing the bacterium into subcells and utilizing Hamiltonian mechanics-based equations of motion to simulate the system. The simulation demonstrated scalability and efficiency, particularly when tested on high-end CPU-GPU clusters. Another study presented a parallel implementation of the stochastic simulation algorithm (SSA) applied to a whole-cell reaction network, which aimed to speed up the simulation process and accelerate the development of comprehensive models (Yeom et al., 2021). These case studies illustrate the potential of whole-cell simulations to provide valuable insights into cellular processes and highlight the importance of computational efficiency and scalability in developing comprehensive models. 5 Applications of dynamic modeling in systems biology 5.1 Drug discovery and therapeutic target identification Dynamic modeling in systems biology has significantly advanced drug discovery and therapeutic target identification. By integrating high-throughput data and computational models, researchers can identify potential drug targets and understand the mechanisms of drug action. For instance, systems biology approaches have been utilized to predict novel interactions between ligands and targets, facilitating the development of multi-target drugs (Yadav and Tripathi, 2018). Molecular dynamics (MD) simulations provide detailed insights into protein-ligand interactions, which are crucial for understanding the structure-function relationship of targets and guiding the drug discovery process (Liu et al., 2018). Additionally, chemoinformatics combined with systems dynamics simulations has revolutionized drug lead optimization and personalized therapy by integrating molecular and systems-level data (Wang et al., 2016).
Computational Molecular Biology 2024, Vol.14, No.4, 134-144 http://bioscipublisher.com/index.php/cmb 139 5.2 Personalized medicine and predictive modeling Personalized medicine benefits greatly from dynamic modeling, which allows for the integration of high-throughput data to support individualized therapeutic decisions. Systems biology approaches can predict biomarkers and regulatory interactions, aiding in the diagnosis and treatment of diseases (Dix et al., 2016). Whole-cell models (WCMs) represent a comprehensive approach to understanding the mechanistic processes of organisms, enabling the examination of system properties and identification of knowledge gaps. These models can simulate the transport of oxygen, drugs, and growth factors, linking cancer development to microenvironmental conditions and supporting the development of personalized therapeutic strategies (Metzcar et al., 2019). The modular assembly of dynamic models also enhances the scalability and consistency of these simulations, making them more applicable to personalized medicine (Pan et al., 2021). 5.3 Synthetic biology and bioengineering applications In synthetic biology and bioengineering, dynamic modeling is essential for predicting the behavior of engineered biological systems. Machine learning approaches combined with multiomics data can predict metabolic pathway dynamics, significantly speeding up bioengineering efforts (Costello and Martín, 2018). This method outperforms traditional kinetic models and can be systematically applied to various products, pathways, or hosts. Advances in systems biology modeling, such as the use of bond graphs, facilitate the construction of large-scale models by ensuring consistency with physical conservation laws, thus enhancing the reusability and granularity of models (Figure 2) (Pan et al., 2021). These models are crucial for developing synthetic biology applications, such as biofuels and medical drugs, by providing accurate predictions of biological dynamics and guiding bioengineering efforts (Costello and Martín, 2018). Figure 2 The importance of modularity in models for systems biology (Pan et al., 2021) Image caption: Modularity can facilitate the construction of whole-cell models by (1) providing unambiguous and flexible interfaces for submodels to communicate; (2) allowing model development and unit testing to be done on individual submodels; (3) separating the description of the model from its implementation; (4) allowing models to be iteratively updated with a record of how the equations and parameters were derived; and (5) allowing repeated motifs to be abstracted into reusable structures (Pan et al., 2021) Figure 2 highlights the significance of modular modeling in systems biology, emphasizing that in complex biological systems, modular design can facilitate model development and validation. Through this approach, different submodules can be developed independently and connected via well-defined interfaces, ensuring scalability and physical consistency in the model. This method is particularly suitable for dynamic modeling in synthetic biology, aiding in the construction of large-scale, biophysically consistent model systems.
Computational Molecular Biology 2024, Vol.14, No.4, 134-144 http://bioscipublisher.com/index.php/cmb 140 6 Computational Tools and Platforms 6.1 Software for pathway and dynamic modeling The development of dynamic models in systems biology has been significantly advanced by various software tools designed to handle the complexity and scale of biological systems. One notable tool is PhysiCell, an open-source, physics-based cell simulator for 3-D multicellular systems. PhysiCell allows for the simulation of both the biochemical microenvironment and the interactions of many cells within this environment. It includes sub-models for cell cycling, apoptosis, necrosis, and motility, and is capable of handling simulations involving up to millions of cells on high-performance computing platforms (Ghaffarizadeh et al., 2017). Another important tool is the Cell Collective platform, which provides a web-based environment for creating and simulating dynamic models of biological processes. This platform is particularly useful in educational settings, allowing students to engage with and understand complex biological systems through interactive modeling (Helikar et al., 2015). SBML (Systems Biology Markup Language) Level 3 is another critical development, providing a standardized format for the exchange and reuse of biological models. This extensible format supports various model types, including reaction-based, constraint-based, and rule-based models, facilitating the integration and sharing of complex systems biology models across different platforms (Keating et al., 2020). Modular assembly approaches using bond graphs have also been proposed to enhance the reusability and scalability of dynamic models. This method ensures that submodels are consistent with each other and with fundamental conservation laws, making it easier to construct large-scale models that are both accurate and detailed (Pan et al., 2021). 6.2 Cloud-based platforms for whole-cell simulations Whole-cell models (WCMs) represent the pinnacle of systems biology modeling, integrating diverse intracellular pathways and processes. These models are computationally intensive, often requiring high-performance computing resources to simulate the vast networks of biochemical reactions involved. To address these challenges, cloud-based platforms have emerged as a viable solution. One such platform is the parallel implementation of the stochastic simulation algorithm (SSA), which has been applied to whole-cell reaction networks. This approach significantly speeds up the simulation process, making it feasible to handle the large-scale networks typical of WCMs (Yeom et al., 2021). The Cell Collective platform also supports cloud-based simulations, enabling high-throughput studies and the exploration of biological possibilities on a large scale. This platform's accessibility and ease of use make it a valuable tool for both research and education (Helikar et al., 2015). Additionally, the PhysiCell simulator, with its parallelized code and scalability, can be deployed on cloud-based high-performance computing platforms to simulate complex multicellular systems. This capability allows researchers to conduct large-scale simulations that would be otherwise infeasible on standard desktop workstations (Ghaffarizadeh et al., 2017). 7 Challenges and Future Directions 7.1 Data integration and model complexity The integration of diverse biological data into comprehensive models remains a significant challenge in systems biology. Whole-cell models (WCMs), which aim to simulate the entirety of cellular processes, exemplify the complexity involved. These models require the assimilation of various types of data, including genomic, transcriptomic, proteomic, and metabolomic information, to accurately represent cellular functions (Yeom et al., 2021; Georgouli et al., 2023). The development of such models is labor-intensive and necessitates sophisticated computational methods to manage and integrate the vast amounts of data (Georgouli et al., 2023). Additionally, ensuring consistency and compatibility among submodels, which are often developed independently, is crucial. Approaches like bond graphs, which apply physical conservation laws to model integration, have been proposed to address these issues, enhancing the modularity and reusability of models (Pan et al., 2021).
Computational Molecular Biology 2024, Vol.14, No.4, 134-144 http://bioscipublisher.com/index.php/cmb 141 7.2 Scalability and computational limitations Scalability and computational efficiency are critical concerns in the simulation of large-scale biological models. Whole-cell simulations, for instance, involve extensive biochemical reaction networks that are computationally expensive to simulate, particularly when using detailed stochastic methods (Yeom et al., 2021). High-performance computing (HPC) platforms and parallel computing techniques have been employed to mitigate these limitations. For example, the use of a parallel implementation of the stochastic simulation algorithm (SSA) has been shown to accelerate the simulation of whole-cell models (Yeom et al., 2021). Moreover, modular approaches that divide the cell into subunits and simulate them in parallel can significantly reduce computational time, as demonstrated in the simulation of Escherichia coli (Das and Mitra, 2021). Despite these advancements, the need for more efficient algorithms and computational frameworks remains, especially as models become more complex and detailed (Stumpf, 2021). 7.3 Future trends in systems biology modeling The future of systems biology modeling is likely to be shaped by several emerging trends. One significant trend is the increasing use of hybrid systems modeling, which combines continuous and discrete dynamics to better capture the complexity of biological systems (Dobbe and Tomlin, 2015). This approach allows for the simulation of phenomena such as gene switching and mutations, which are not adequately represented by traditional continuous models. Another promising direction is the application of data-driven modeling techniques, such as neural networks, to learn and predict the dynamics of biological systems from experimental data (Legaard et al., 2021). These techniques can complement traditional mechanistic models, providing a more flexible and scalable approach to modeling complex biological processes. Additionally, the continued development of high-throughput technologies and the accumulation of large-scale biological data will drive the need for more sophisticated and integrative modeling approaches (Ji et al., 2017). The integration of these trends will likely lead to more accurate and comprehensive models, facilitating new discoveries and applications in biotechnology, energy and personalized medicine (Hernandez et al., 2020; Meyer and Saez-Rodriguez, 2021; Lin, 2024). 8 Concluding Remarks Dynamic modeling in systems biology has made significant progress, particularly in the development of whole-cell models (WCMs). These models integrate vast knowledge of cellular processes, revealing the complex interactions between mechanisms within cells. Research on WCMs has emphasized the importance of data quality for model parameterization and validation, highlighting the necessity of high-quality data to improve the accuracy of model predictions. Moreover, modular modeling approaches, such as bond graphs, have facilitated the construction of large-scale dynamic models by ensuring consistency with physical conservation laws, enhancing model reusability and scalability. Simplified assumptions continue to be key in ensuring that models remain representative and operational in complex biological systems. The future of systems biology and whole-cell simulations holds great promise. High-performance computing platforms will provide the necessary power to simulate complex biosystems, speeding up simulations and improving analytical precision. The development of new standards and simulation tools will enhance the reproducibility of models, fostering scientific discoveries. Integrating molecular dynamics simulations with whole-cell models will offer deeper insights into cellular processes at the atomic level, narrowing the gap between computational predictions and experimental observations. Crowdsourced initiatives like the DREAM challenges will continue to drive the field forward by providing unbiased standards for evaluating modeling methods. To further advance dynamic modeling, efforts should first focus on improving data quality and availability. High-quality experimental data are essential for accurate model parameterization and validation, and standardizing data collection and sharing processes is crucial. Additionally, scalable and modular modeling frameworks should be developed, as modular approaches simplify the construction of large-scale models while maintaining physical consistency. Improving computational tools and resources is also essential; investments in high-performance
Computational Molecular Biology 2024, Vol.14, No.4, 134-144 http://bioscipublisher.com/index.php/cmb 142 computing infrastructure and the development of efficient algorithms will accelerate the execution of whole-cell models and enhance their capacity to handle complex systems. To foster reproducibility, new standards and protocols must be established to ensure models can be reliably replicated and validated across studies. Lastly, collaborative and crowdsourced efforts should be encouraged; supporting initiatives like the DREAM challenges will drive innovation and improve the quality of model assessments. Acknowledgments We are deeply grateful for Dr. Li's expertise and patience in this study. we also thank the two anonymous peer reviewers for their careful review and valuable comments on this manuscript. Conflict of Interest Disclosure The authors affirm that this research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest. References Costello Z., and Martín H.G., 2018, A machine learning approach to predict metabolic pathway dynamics from time-series multiomics data, NPJ Systems Biology and Applications, 4(1): 1-14. https://doi.org/10.1038/s41540-018-0054-3 Dahal S., Yurkovich J.T., Xu H., Palsson B.O., and Yang L., 2020, Synthesizing systems biology knowledge from omics using genome‐scale models, Proteomics, 20(17-18): 1900282. https://doi.org/10.1002/pmic.201900282 Das B., and Mitra P., 2021, High-performance whole-cell simulation exploiting modular cell biology principles, Journal of Chemical Information and Modeling, 61(3): 1481-1492. https://doi.org/10.1021/acs.jcim.0c01282 Dix A., Vlaic S., Vlaic S., Guthke R., and Linde J., 2016, Use of systems biology to decipher host-pathogen interaction networks and predict biomarkers, Clinical Microbiology and Infection, 22(7): 600-606. https://doi.org/10.1016/j.cmi.2016.04.014 Dobbe R., and Tomlin C.J., 2015, Hybrid systems modeling for (cancer) systems biology, BioRxiv, 2015: 035022. https://doi.org/10.1101/035022 Eddy J., Funk C., and Price N., 2015, Fostering synergy between cell biology and systems biology, Trends in Cell Biology, 25(8)p: 440-445. https://doi.org/10.1016/j.tcb.2015.04.005 Georgouli K., Yeom J.S., Blake R.C., and Navid A., 2023, Multi-scale models of whole cells: progress and challenges, Frontiers in Cell and Developmental Biology, 11: 1260507. https://doi.org/10.3389/fcell.2023.1260507 Getz M.C., Nirody J.A., and Rangamani P., 2018, Stability analysis in spatial modeling of cell signaling, Wiley Interdisciplinary Reviews: Systems Biology and Medicine, 10(1): e1395. https://doi.org/10.1002/wsbm.1395 Goldberg A.P., Szigeti B., Chew Y.H., Sekar J.A.P., Roth Y.D., and Karr J.R., 2017, Emerging whole-cell modeling principles and methods, Current Opinion in Biotechnology, 51: 97-102. https://doi.org/10.1016/j.copbio.2017.12.013 Groß A., Kracher B., Kraus J.M., Kühlwein S.D., Pfister A.S., Wiese S., Luckert K., Pötz O., Joos T., Daele D., Raedt L., Kühl M., and Kestler H., 2019, Representing dynamic biological networks with multi-scale probabilistic models, Communications Biology, 2(1): 21. https://doi.org/10.1038/s42003-018-0268-3 Helikar T., Cutucache C.E., Dahlquist L.M., Herek T.A., Larson J.J., and Rogers J.A., 2015, Integrating interactive computational modeling in biology curricula, PLoS Computational Biology, 11(3): e1004131. https://doi.org/10.1371/journal.pcbi.1004131 Hernandez C., Thomas-Chollier M., Naldi A., and Thieffry D., 2020, Computational verification of large logical models—application to the prediction of t cell response to checkpoint inhibitors, Frontiers in Physiology, 11: 558606. https://doi.org/10.3389/fphys.2020.558606 Hou J., Acharya L., Zhu D., and Cheng J., 2016, An overview of bioinformatics methods for modeling biological pathways in yeast, Briefings in Functional Genomics, 15(2): 95-108. https://doi.org/10.1145/3459930.3471161 Ji Z., Yan K., Li W., Hu H., and Zhu X., 2017, Mathematical and computational modeling in complex biological systems, BioMed Research International, 2017(1): 5958321. https://doi.org/10.1155/2017/5958321
Computational Molecular Biology 2024, Vol.14, No.4, 134-144 http://bioscipublisher.com/index.php/cmb 143 Keating S.M., Waltemath D., König M., Zhang F.K., Dräger A., Chaouiya C., Bergmann F.T., Finney A., Gillespie C.S., Helikar T., Hoops S., Malik-Sheriff R., Moodie S.L., Moraru I.I., Myers C.J., Naldi A., Olivier B.G., Sahle S., Schaff J.C., Smith L.P., Swat M.J., Thieffry D., Watanabe L., Wilkinson D.J., Blinov M.L., Begley K., Faeder J., Gómez H., Hamm T., Inagaki Y., Liebermeister W., Lister A., Lucio D., Mjolsness E., Proctor C., Raman K., Rodriguez N., Shaffer C., Shapiro B., Stelling J., Swainston N., Tanimura N., Wagner J., Meier-Schellersheim M., Sauro H., Palsson B., Bolouri H., Kitano H., Funahashi A., Hermjakob H., Doyle J., and Hucka M., 2020, SBML level 3: an extensible format for the exchange and reuse of biological models, Molecular Systems Biology, 16(8): e9110. https://doi.org/10.15252/msb.20199110 Kim O.D., Rocha M., and Maia P., 2018, A review of dynamic modeling approaches and their application in computational strain optimization for metabolic engineering, Frontiers in Microbiology, 9: 1690. https://doi.org/10.3389/fmicb.2018.01690 Koch I., and Büttner B., 2023, Computational modeling of signal transduction networks kinetic parameters - petri net approaches, American Journal of Physiology-Cell physiology, 324(5): C1126-C1140. https://doi.org/10.1152/ajpcell.00487.2022 Korsbo N., and Jönsson H., 2020, It’s about time: analysing simplifying assumptions for modelling multi-step pathways in systems biology, PLoS Computational Biology, 16(6): e1007982. https://doi.org/10.1371/journal.pcbi.1007982 Legaard C., Schranz T., Schweiger G., Drgovna J., Falay B., Gomes C., Iosifidis A., Abkar M., and Larsen P., 2021, Constructing neural network based models for simulating dynamical systems, ACM Computing Surveys, 55: 1-34. https://doi.org/10.1145/3567591 Linden N.J., Kramer B., and Rangamani P., 2022, Bayesian parameter estimation for dynamical models in systems biology, PLOS Computational Biology, 18(10): e1010651. https://doi.org/10.1371/journal.pcbi.1010651 Lin J., 2024, Sustainable development strategy of bioenergy and global energy transformation, Journal of Energy Bioscience, 15(1): 10-19. https://doi.org/10.5376/jeb.2024.15.0002 Liu X.W., Shi D.F., Zhou S.Y., Liu H.L., Liu H.X., and Yao X.J., 2018, Molecular dynamics simulations and novel drug discovery, Expert Opinion on Drug Discovery, 13(1): 23-37. https://doi.org/10.1080/17460441.2018.1403419 Massonis G., Villaverde A.F., and Banga J.R., 2022, Improving dynamic predictions with ensembles of observable models, Bioinformatics, 39(1): btac755. https://doi.org/10.1093/bioinformatics/btac755 Metzcar J., Wang Y., Heiland R., and Macklin P., 2019, A review of cell-based computational modeling in cancer biology, JCO Clinical Cancer Informatics, 2: 1-13. https://doi.org/10.1200/CCI.18.00069 Meyer P., and Saez-Rodriguez J., 2021, Advances in systems biology modeling: 10 years of crowdsourcing dream challenges, Cell systems, 12(6):636-653. https://doi.org/10.1016/j.cels.2021.05.015 Musilová J., and Sedlář K., 2021, Tools for time-course simulation in systems biology: a brief overview, Briefings in Bioinformatics, 22(5): bbaa392. https://doi.org/10.1093/bib/bbaa392 O’Hara L., Livigni A., Theo T., Boyer B., Angus T., Wright D., Chen S., Raza S., Barnett M., Digard P., Smith L., and Freeman T., 2016, Modelling the structure and dynamics of biological pathways, PLoS Biology, 14(8): e1002530. https://doi.org/10.1371/journal.pbio.1002530 Pan M., Gawthrop P.J., Cursons J., and Crampin E.J., 2021, Modular assembly of dynamic models in systems biology, PLoS Computational Biology, 17(10): e1009513. https://doi.org/10.1101/2021.07.26.453900 Ragusa M.B., and Russo G., 2016, ODEs approaches in modeling fibrosis: comment on "towards a unified approach in the modeling of fibrosis: a review with research perspectives" by martine ben amar and carlo bianca, Physics of Life Reviews, 17: 61-85. https://doi.org/10.1016/j.plrev.2016.05.012 Stumpf M.P.H., 2021, Statistical and computational challenges for whole cell modelling, Current Opinion in Systems Biology, 26: 58-63. https://doi.org/10.1016/J.COISB.2021.04.005 Tavassoly I., Goldfarb J., and Iyengar R., 2018, Systems biology primer: the basic methods and approaches, Essays in Biochemistry, 62(4): 487-500. https://doi.org/10.1042/EBC20180003 Wang G., 2016, Chemoinformatics in the new era: from molecular dynamics to systems dynamics, Molecules, 21(3): 71. https://doi.org/10.3390/molecules21030071 Wang S., and Perdikaris P., 2021, Long-time integration of parametric evolution equations with physics-informed deeponets, Journal of Computational Physics, 475: 111855. https://doi.org/10.1016/j.jcp.2022.111855 Yadav B.S., and Tripathi V., 2018, Recent advances in the system biology-based target identification and drug discovery, Current Topics in Medicinal Chemistry, 18(20): 1737-1744. https://doi.org/10.2174/1568026618666181025112344
Computational Molecular Biology 2024, Vol.14, No.4, 134-144 http://bioscipublisher.com/index.php/cmb 144 Yang B., and Chen Y.H., 2020, Overview of gene regulatory network inference based on differential equation models, Current Protein and Peptide Science, 21(11): 1054-1059. https://doi.org/10.2174/1389203721666200213103350 Yazdani A., Lu L., Raissi M., and Karniadakis G.E., 2019, Systems biology informed deep learning for inferring parameters and hidden dynamics, PLoS Computational Biology, 16(11): e1007575. https://doi.org/10.1371/journal.pcbi.1007575 Yeom J.S., Georgouli K., Blake R., and Navid A., 2021, Towards dynamic simulation of a whole cell model, Proceedings of the 12th ACM Conference on Bioinformatics Computational Biology and Health Informatics, 2021: 1-10. https://doi.org/10.1145/3459930.3471161 Heiland R., Friedman S.H., Mumenthaler S.M., and Macklin P., 2017, Physicell: an open source physics-based cell simulator for 3-D multicellular systems, PLoS Computational Biology, 14(2): e1005991. https://doi.org/10.1101/088773 Zhang Q., Yu Y., Zhang J., and Liang H., 2018, Using single-index ODEs to study dynamic gene regulatory network, PLoS ONE, 13(2): e0192833. https://doi.org/10.1371/journal.pone.0192833
Computational Molecular Biology 2024, Vol.14, No.4, 145-154 http://bioscipublisher.com/index.php/cmb 145 Research Insight Open Access The Evolving Landscape of Genomic Selection: Insights and Innovations in Quantitative Genetics Xiaojun Li , Shuiji Zhang Biotechnology Research Center, Cuixi Academy of Biotechnology, Zhuji, 311800, Zhejiang, China Corresponding author: xiaojun.li@cuixi.org Computational Molecular Biology, 2024, Vol.14, No.4 doi: 10.5376/cmb.2024.14.0017 Received: 20 May, 2024 Accepted: 30 Jun., 2024 Published: 12 Jul., 2024 Copyright © 2024 Li and Zhang, This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Preferred citation for this article: Li X.J., and Zhang S.J., 2024, The evolving landscape of genomic selection: insights and innovations in quantitative genetics, Computational Molecular Biology, 14(4): 145-154 (doi: 10.5376/cmb.2024.14.0017) Abstract Genomic selection (GS), as a key technology in modern breeding programs, has significantly advanced crop and livestock breeding. By integrating quantitative genetics and genome prediction models, GS has improved the accuracy of predicting complex traits and accelerated the cultivation of high-yield and stress resistant varieties. This study explores the historical evolution, technological innovation, and practical applications of genome selection in breeding. It analyzes the advantages brought by innovative technologies such as high-density genotyping and whole genome prediction, especially their widespread application in multi trait and multi environment models. Although GS has great potential in modern breeding, it still faces challenges such as genotype environment interaction, prediction accuracy, and data complexity. I hope to summarize the latest progress of GS through case analysis and provide direction for future research, in order to promote the application of quantitative genetics and genome selection in a wider range of fields, and provide support for global food security and sustainable agricultural development. Keywords Genomic selection; Quantitative genetics; Genomic prediction; Marker-assisted selection; Complex traits 1 Introduction Genomic Selection (GS) has emerged as a significant breakthrough in the field of breeding in recent years. Unlike traditional marker-assisted selection, which relies on a limited number of markers associated with specific traits, GS utilizes genome-wide marker data to predict the breeding values of individuals. By estimating the effects of all markers comprehensively, GS captures the small-effect alleles that influence complex traits, thereby improving breeding efficiency and accuracy (Meuwissen et al., 2016; Crossa et al., 2017). With advances in high-density genotyping technologies, GS has been widely applied in both plant and animal breeding, significantly accelerating genetic improvement (Heslot et al., 2015; Rice and Lipka, 2021). GS plays a crucial role in modern breeding programs. By integrating genome-wide marker information, GS significantly increases selection accuracy, shortens breeding cycles, and enhances genetic gains per unit time. This method is particularly effective in improving quantitative traits controlled by multiple genes, especially in addressing challenges related to climate change and enhancing crop yields and livestock production (Liu et al., 2019; Merrick et al., 2022). Additionally, GS reduces the need for large-scale phenotyping, lowers breeding costs, and, through advanced statistical models and high-throughput phenotyping technologies, improves breeding efficiency (Larkin et al., 2019; Cappetta et al., 2020). This study systematically reviews the latest developments and innovations in the field of GS. By analyzing the development history, various application models, and methods of GS, this study explores the actual effects of GS in different breeding programs and evaluates its impact on genetic gain and breeding efficiency. In addition, challenges and limitations in the implementation of GS were identified, and possible solutions to address these issues were proposed. In the future, GS is expected to continue promoting the sustainable development of global agriculture by integrating emerging technologies and improving prediction accuracy. 2 Evolution of Genomic Selection 2.1 Historical development of GS The concept of genomic selection (GS) was first introduced by Meuwissen et al. in 2001, marking a significant departure from traditional marker-assisted selection (MAS) methods. Prior to this, agricultural genomics primarily
Computational Molecular Biology 2024, Vol.14, No.4, 145-154 http://bioscipublisher.com/index.php/cmb 146 focused on detecting quantitative trait loci (QTL) using experimental crosses or existing family relationships. The innovative approach proposed by Meuwissen et al. required a high density of genomic markers to ensure that every QTL affecting a relevant trait would be in linkage disequilibrium with at least one marker. This allowed for selection decisions to be based on the joint merit of all markers across the genome, rather than a few significant ones. This breakthrough laid the foundation for the rapid advancements in GS, particularly in livestock breeding, where it has led to unprecedented improvements in genetic gain per generation (Koning et al., 2016; Meuwissen et al., 2016). 2.2 Early applications in crop and livestock breeding The initial applications of GS were predominantly in livestock breeding, driven by the high individual value of livestock and the significant reduction in generation intervals achievable through GS. Dairy cattle breeding, in particular, saw a dramatic shift from traditional progeny testing to GS, resulting in a doubling of genetic improvement per generation (Koning et al., 2016; Meuwissen et al., 2016). The success in livestock spurred interest in applying GS to crop breeding. Early applications in crops such as rice, maize, and wheat have demonstrated that GS has resulted in significant genetic gains, thanks to the large international efforts led by organizations like the International Maize and Wheat Improvement Center (CIMMYT) (Crossa et al., 2017; Li and Jiong, 2022).. The integration of GS in plant breeding has been further enhanced by advances in field management, heritability estimation, and the development of robust GS models that account for genotype-by-environment interactions (Burri, 2017; Xu et al., 2019). 2.3 Technological advances driving GS evolution The evolution of GS has been propelled by several key technological advancements. The development of high-density single nucleotide polymorphism (SNP) chips around 2006 made it feasible to routinely genotype animals and plants for thousands of markers in a cost-effective manner. This was complemented by improvements in statistical modeling approaches, including the Bayesian methods (BayesA and BayesB) introduced by Meuwissen et al., which have been extensively refined over the years (Koning et al., 2016). The advent of high-throughput sequencing technologies has further revolutionized GS by enabling the use of whole-genome sequence data, which offers higher accuracy in predicting breeding values (Meuwissen et al., 2016; VanRaden, 2020). Additionally, the integration of new technologies such as hyperspectral imaging and digital breeding platforms is poised to further enhance the efficiency and accuracy of GS in both plant and animal breeding (Crossa et al., 2017; Jeon et al., 2023). 3 Quantitative Genetics and Its Integration with Genomic Selection 3.1 Basic principles of quantitative genetics Quantitative genetics is the study of traits that are influenced by multiple genes and environmental factors. These traits, known as quantitative traits, exhibit continuous variation and are typically measured on a numerical scale. The fundamental principles of quantitative genetics involve the partitioning of phenotypic variance into genetic and environmental components, the estimation of genetic parameters such as heritability, and the prediction of breeding values. Traditional methods like Best Linear Unbiased Prediction (BLUP) have been widely used to estimate breeding values by leveraging pedigree information and phenotypic data (Koning, 2016; Li et al., 2017). 3.2 Genomic prediction models Genomic prediction models have revolutionized the field of quantitative genetics by incorporating dense marker information to predict the genetic potential of individuals. These models can be broadly categorized into linear models, Bayesian approaches, and machine learning or non-linear models. 3.2.1 G-blup and other linear models Genomic Best Linear Unbiased Prediction (G-BLUP) is a widely used linear model that extends the traditional BLUP by incorporating genomic information. G-BLUP assumes that all markers contribute equally to the genetic variance and uses a genomic relationship matrix to capture the genetic similarities between individuals (Koning, 2016; Li et al., 2017). Other linear models, such as Ridge Regression BLUP (RR-BLUP), have also been employed for genomic prediction, particularly when dealing with traits controlled by a large number of small-effect loci (Wang et al., 2015; Liu et al., 2018).
RkJQdWJsaXNoZXIy MjQ4ODYzNA==