CMB_2025v15n4

Computational Molecular Biology 2025, Vol.15, No.4, 208-217 http://bioscipublisher.com/index.php/cmb 21 3 thus winning precious time for the diagnosis of neonatal genetic diseases. Another often overlooked advantage is scalability. The standardized process structure is clear. When developing new testing items or switching to tumor panel analysis, only minor adjustments need to be made to the original process (Roy et al., 2018). Meanwhile, fixed SOPs also facilitate personnel training, enabling novices to get started more quickly (Whiffin et al., 2016). Supervision and inspection are also more efficient because there is a unified operational basis. Overall, an increasing amount of evidence indicates that the standardization of bioinformatics processes not only enhances the quality and efficiency of testing but also lays the foundation for data inter-communication and resource sharing across the country. This trend has almost become a consensus in the industry. 6 Case Study: Construction of Standardized Analysis Process for Tumor Genomics 6.1 Research background and objectives The analysis of tumor genomics has always been complex, not only due to the huge volume of data, but also because of the diverse types of results and the high requirements for interpretation. A case often involves multiple aspects such as somatic cell SNV/Indel, copy number changes, gene fusions, tumor mutational burden (TMB), and microsatellite instability (MSI). In the past, different laboratories adopted various analytical methods. Some pursued sensitivity, while others emphasized speed, and the results were often difficult to compare (Feng et al., 2023). We aim to integrate the analysis methods, output formats and report structures by establishing a standardized bioinformatics process for precise tumor diagnosis. From the reception of sequencing data to the interpretation of results, the entire process will be automated and highly consistent, truly achieving an analysis system that is “repeatable, traceable and clinically applicable”. The goal of the process is not only to increase the detection rate, but also to ensure the stability of the report, making the results of different batches and different analysts as consistent as possible. Ultimately, we want to see if such a standardized process can stand the test of time in real clinical scenarios, and also use this to summarise experiences and improve the future optimisation direction (Figure 1) (Nasra et al., 2024; Nguyen et al., 2025). 6.2 Process construction and implementation steps The construction of the process starts from the characteristics of tumor samples and adopts a dual-channel analysis design of DNA and RNA. DNA sequencing was mainly whole-exome sequencing (WES) that matched tumor-normal controls, with average depths of 400×and 180×, respectively. RNA sequencing is the whole transcriptome data. The sequencing platform is Illumina, with 150 bp at both ends and a total read length of approximately 100 million pairs. The samples should undergo pathological assessment before being put on the machine, and the tumor content should exceed 20%. After the DNA/RNA extraction is completed, the library is constructed and sequenced. After the sequencing is finished, the LIMS system automatically records the sample information and initiates the analysis process (managed by Nextflow). The first step of the process is data quality control and comparison. DNA sequences were evaluated for quality using FastQC, and Trimmomatic was used to remove linkers and low-quality fragments. The reference genome from BWA-MEM to hg38 was compared, and then repeated labelling and base mass recalibration were performed. RNA data is monitored using FastQC/MultiQC indicators, such as the rRNA ratio, Q30 ratio, and alignment rate, and is cleaned up when necessary. Subsequently, the RNA-seq data were aligned to the hg38 transcriptome using STAR to support the identification of splicing sites. All quality control results will be automatically summarised into the log, including indicators such as sequencing depth, coverage rate, and Q30 distribution (Cabello-Aguilar et al., 2023; Zerdes et al., 2025). In the mutation detection section, the process adopts a dual-algorithm strategy. Somatic cell SNV/Indel was jointly detected by GATK Mutect2 and Strelka2. Mutect2 first generates a candidate variant set, which then filters out sequencing noise and false positives through FilterMutectCalls. Strelka2 is more sensitive to low-frequency variations. After taking the union of the two results, the internal assessment showed that the detection sensitivity increased from approximately 85% to over 95%. The analysis of CNV and LOH was performed by CNVkit, and structural variations (SV) were identified by Manta for abnormal pairing and split reads. Fusion genes were detected at the RNA level using FusionCatcher, and the high-confidence results were then manually rechecked.

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==