CMB_2025v15n4

Computational Molecular Biology 2025, Vol.15, No.4, 208-217 http://bioscipublisher.com/index.php/cmb 21 1 transforming the sequencing data analysis process into a fixed and reusable standard pipeline not only saves manpower but also makes process upgrades traceable. More importantly, when multiple centers use the same set of processes, the consistency of the analysis results can also be guaranteed (Jackson et al., 2021; Baykal et al., 2024). 4.2 Software containerization and environment standardization Another major reason for inconsistent process results is actually the difference in software environments . Version differences of the same software, mismatched dependency packages, and even changes in the operating system may all cause minor deviations in the analysis results. To avoid these problems, containerization technology is widely adopted. Docker and Singularity are the two most commonly used container tools. The former is mostly used in server or cloud environments, while the latter is specifically optimized for high-performance computing platforms. The function of a container lies in that it can package analysis software, dependent environments, and system configurations all into a single image file. In this way, no matter which machine is running, the environment can be guaranteed to be consistent. In other words, as long as the container images are the same, the analysis results should also be exactly the same. Containerization also makes the deployment and maintenance of processes easy. Upgrading a certain software version only requires replacing the image and then distributing it uniformly. Workflow systems such as Nextflow themselves also support direct invocation of container images, making standardization of the entire environment simpler. Nowadays, some clinical institutions have even packaged the entire NGS analysis process into containers for the rapid deployment of the same system across different hospitals (Kadri et al., 2022; Florek et al., 2025). With the support of containerization, the portability and repeatability of bioinformatics processes have been significantly enhanced, making it easier for laboratories to meet the environmental consistency requirements stipulated by regulations. 4.3 Analysis process verification and quality control Before applying bioinformatics processes to clinical practice, strict performance validation must be conducted to confirm their detection ability and stability for target variations (Jennings et al., 2017). Process validation typically involves the assessment of sensitivity, specificity, accuracy and repeatability (Roy et al., 2018). For this reason, the industry recommends using standard reference samples and known true value datasets for testing. For example, the high-confidence variant set of human Genome standard samples (such as NA12878) provided by the Genome in a Bottle (GIAB) Project is widely used to evaluate the detection rate of SNV/Indel in the process. During verification, the data of the standard sample was input into the process to be tested, and the conformity of the output variation with the authoritative true value set was compared to calculate the sensitivity and false positive rate. For the detection of somatic variations such as tumors, alliances such as SEQC2 have also released standard datasets and reference variation sets, which can be used to verify the detection ability of the process at different variation frequencies. In addition to using standard samples, the laboratory should also design repeat tests to evaluate the repeatability of the process, that is, whether the same results are obtained from multiple independent runs (Baykal et al., 2024). After verification, a written report should be formed to record the process version, test data, performance indicators, etc., for regulatory review (Jennings et al., 2017). In actual operation, quality control measures should also be introduced, such as adding control samples with known variations in each batch of analysis to monitor process performance. If control variations are not detected or the results are abnormal, problems in the analysis process can be investigated and analyzed in a timely manner (Haanpääet al., 2025). In addition, version control and change management are also important parts of process quality control. When bioinformatics processes or the software tools therein are updated, the differences in results between the old and new versions must be evaluated to ensure that the improvements do not reduce the detection sensitivity. Many laboratories use version control systems such as Git to manage process code and establish automated test pipelines. They set threshold monitoring for key indicators (such as detection rate and accuracy rate), and refuse to release a new version once its performance falls below the set standards (Baykal et al., 2024; Haanpää et al., 2025). Through continuous quality monitoring and version management, the performance of bioinformatics processes can remain stable and steadily improve with technological progress.

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==