Computational Molecular Biology 2025, Vol.15, No.3, 131-140 http://bioscipublisher.com/index.php/cmb 136 to predict the IC50 value of the drug, helping doctors make a rough judgment before treatment (Clayton et al., 2020). Of course, TCGA is not the only source. Public databases like GEO also store a large amount of transcriptome data, which can be used not only to validate new markers but also to train predictive models, and are equally convenient to use (Wang et al., 2023). 5.3 Issues regarding data standardization and repeatability When the data of multi-source omics are combined, problems also arise. Due to different experimental conditions and the mixture of batch effects, the analysis results often do not match. Even in different drug sensitivity databases, the test results of the same drug can vary greatly, and this situation is not uncommon. To make the data more reliable, we must first start from the source - the experimental process, data processing, and standardization. Each link should be as unified as possible. Even if the result looks very good, it still needs to be tried again in an independent queue to verify whether it can be reproduced; otherwise, it is hard to say how reliable it is (Moossavi et al., 2020). Nowadays, many teams are making efforts in this regard, unifying the analysis process and promoting data sharing, hoping to make drug sensitivity studies more stable and comparable (Sinke et al., 2021). 6 Case Study 6.1 Case selection: anti-cancer drug sensitivity analysis based on GDSC and TCGA Here, we take EGFR-targeted therapy for lung cancer as an example. Let's start with the GDSC database, pick out the genomic markers related to EGFR-TKI drug sensitivity, and then see if these markers "make sense" in real patients. The specific approach is to take the screened candidate genes into the TCGA lung adenocarcinoma data for verification to see if there is a consistent trend between them and clinical efficacy (Huang et al., 2018; Cheng et al., 2020). In this way, not only can clues be found at the cell line level, but also the reliability of the results can be tested with patient data. 6.2 Multi-omics feature screening process and identification of key markers After analysis, the result is quite interesting. Sensitive mutations of EGFR only occur in those cell lines that respond well to drugs, while the drug-resistant batch often carries driver mutations such as KRAS (Ohashi et al., 2012). In addition, drug-resistant cells also exhibit distinct EMT characteristics, with different forms and expression patterns. Piecing together these multi-omics clues reveals a more complete picture - different mutation backgrounds and transcriptional states seem to be jointly shaping the differences in cells' responses to EGFR-Tkis (Yamaguchi et al., 2012). 6.3 Verification and clinical correlation analysis: taking EGFR mutation and TKI response as an example Clinical data also confirm this point. Lung cancer patients with EGFR mutations often have a much more obvious response after using EGFR-TKI than those with wild-type EGFR-TKI, and the difference in therapeutic effect is obvious at a glance (Mitsudomi et al., 2006; Li et al., 2010). For this reason, nowadays, doctors basically conduct EGFR gene testing before formulating treatment plans for patients with advanced lung adenocarcinoma. This step has become a routine operation. Without clarifying the mutation situation, it is often difficult to even choose the right medicine. 7 Challenges and Limitations 7.1 Heterogeneity and high-dimensional feature issues of multi-omics data Multi-omics data may seem rich in information, but there are also many problems. The noise brought by different platforms and batches often overshadows the truly meaningful signals, resulting in a chaotic analysis (Liu and Park, 2024). Moreover, when there are many features and few samples, if the model is not careful, it will "learn off course" and overfit may occur (Hu et al., 2022). At such times, the data must be sorted out first - standardization, feature screening, and dimensionality reduction. None of these steps can be omitted. Only by suppressing the redundancy and noise can the subsequent models remain stable. 7.2 Obstacles to model interpretability and clinical translability No matter how accurate many models are, doctors are still reluctant to believe them directly. The main reason is that it's too complicated to understand and explain clearly. Nowadays, some research is seeking ways to make
RkJQdWJsaXNoZXIy MjQ4ODYzNA==