CMB_2025v15n4

Computational Molecular Biology 2025, Vol.15, No.4, 193-207 http://bioscipublisher.com/index.php/cmb 196 next moment. It is not precise, but it is very practical when dealing with large-scale regulatory networks, such as simulating the gene regulatory relationships during development to see which stable states the system will eventually converge to - these "attractors" often correspond to different cell types. Although the Boolean model does not care about temporal continuity, it can help designers verify at an early stage whether the logic of the loop works and whether the output meets expectations. There are still some "compromise" solutions between the Boolean model and the ordinary differential equation model. Just as multi-valued logic models allow genes to have more than two expression levels, Petri net models describe system changes in the form of state transition diagrams. This type of method is particularly useful when the experimental data is insufficient or the parameters are difficult to measure, allowing people to make a qualitative judgment on the system behavior. On the other hand, there is another type of research that focuses on the characteristics of the network topology itself. For instance, motif analysis - within the vast gene regulatory network, certain small structures (motifs) tend to recur. 4 Model Parameters and System Identification Methods 4.1 Parameter estimation and sensitivity analysis After the model is built, the real troubles often just begin - where do the parameters come from and how are they determined? These parameters include the transcription rate, translation rate, protein degradation rate of genes, as well as the binding constant and Hill coefficient of regulatory effects and the like. Usually, they are not set out of thin air but are "dug out" from experimental data. This process is actually like doing optimization: adjusting a set of parameters to minimize the error between the model output and the experimental observation. The most commonly used methods include least squares fitting, maximum likelihood estimation, and the more complex Bayesian inference. For example, when studying gene oscillators, the protein concentration time series measured experimentally can be used to fit the transcription rate and inhibition constant through the nonlinear least squares algorithm, so that the simulated oscillation curve coincides with the real curve as much as possible (Liu and Niranjan, 2017). But the problem is that parameters are not always "precisely identified". Sometimes different combinations of parameters can yield almost the same result, which is called poor identifiability. In such cases, researchers usually conduct sensitivity analyses to see which parameters truly determine the behavior of the model (Cao and Grima, 2019). Local sensitivity analysis is to observe the changes within a small range, calculate the partial derivatives of the output with respect to the slight perturbations of each parameter, and identify those key parameters that "get chaotic at the slightest movement". Global sensitivity analysis is even more "crude". It measures the impact of parameter uncertainty on model output fluctuations by sampling over a wide range in the parameter space, such as using Sobol indices to calculate the contribution of each parameter to the output variance. In this way, important parameters can be screened out, and subsequent experiments can focus on measuring or optimizing them, while those parameters with little impact can be simply processed to improve estimation efficiency. 4.2 Integration methods of experimental data and modeling data Getting experimental data truly "integrated" into the model is a crucial step in making the model more reliable and closer to real biological systems. But this matter is not that straightforward. The first step usually involves processing the data, as the units and scales of the things measured in different experiments vary. For example, values such as fluorescence intensity, protein concentration, and transcription rate often vary greatly. Without normalization, it is simply impossible to match the model output (Wang et al., 2014). Therefore, researchers often first convert the fluorescence signal into a relative expression level and then map it to the concentration variable in themodel. However, having only one type of data is often not enough. The experimental results of synthetic genetic circuits usually come from several sources: time series data, steady-state point data, and sometimes distribution information at the single-cell level. The usage of different data types also varies. Time series can be directly used to fit dynamic processes. The steady-state value can be regarded as the constraint condition of the equation; Distributed data, such as the differences in expression between cells, are often used to adjust the noise terms or parameter distribution assumptions in the model. A more complex integration approach is to use the Bayesian

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==