BM_2024v15n1

Bioscience Method 2024, Vol.15, No.1, 37-49 http://bioscipublisher.com/index.php/bm 40 2 Data Mining and Feature Selection 2.1 Data types and sources required for drug screening Drug screening is a complex process that requires support from multiple data types. To ensure the accuracy and effectiveness of screening, researchers need to collect and analyze data from multiple sources and types. From the perspective of data types, genomic and transcriptomic data provide researchers with information about genes and gene expression, which is crucial for identifying genes and gene pathways associated with specific diseases. Meanwhile, proteomic data involves protein expression, structure, and function, which is crucial for the discovery of drug targets. Metabolomics data describes the metabolic processes and metabolites within organisms, providing valuable clues for understanding the mechanisms of disease occurrence and development. In addition, clinical data is an important basis for evaluating the efficacy and safety of drugs, including key information such as the patient's medical history, symptoms, and treatment effectiveness. Drug chemistry and biological activity data provide information about drug structure, mechanism of action, and biological activity, which is the basis for drug screening (Figure 2). Figure 2 Types of drug screening data In terms of data sources, public databases are an important way for researchers to obtain biomedical data. Dogan (2018) found that NCBI databases such as GeneBank, UniProt, and MetaboLights store a large amount of biomedical data for researchers to access free of charge or for a fee. In addition, Burton et al. (2017) believe that research institutions are also an important source of data. Research institutions and teams around the world have accumulated a large amount of experimental data and research results in drug development. By collaborating or purchasing data with these institutions, researchers can obtain valuable data resources. Walke et al. (2023) argue that clinical trials are a crucial step in evaluating drug efficacy and safety, and the data generated is crucial for drug screening and development. Long term registration and follow-up of patients can also collect valuable data on disease progression and treatment effectiveness, providing strong support for drug screening. 2.2 Application of data mining technology in drug development In the drug discovery stage, data mining techniques are widely applied in data analysis in fields such as genomics, proteomics, and metabolomics. By deeply mining these large-scale biomedical data (Yang et al., 2020), researchers can identify genes, proteins, or metabolites associated with specific diseases, thereby identifying potential drug targets. This greatly accelerates the speed of drug discovery and improves the success rate of research and development.

RkJQdWJsaXNoZXIy MjQ4ODYzMg==