BM_2024v15n1

Bioscience Method 2024, Vol.15, No.1, 37-49 http://bioscipublisher.com/index.php/bm 41 In the drug design phase, data mining techniques can help researchers predict the biological activity, pharmacokinetic characteristics, and possible side effects of candidate drugs. In the clinical trial stage, by analyzing and mining a large amount of clinical data, researchers can evaluate the efficacy and safety of drugs, discover the correlation between drugs and diseases, and predict patient reactions to different drugs. In the application of drug market, data mining technology can help pharmaceutical companies analyze customer needs and concerns, optimize product strategies and marketing plans. The application of data mining technology in drug development involves multiple aspects, including but not limited to the detection of adverse drug reactions, drug safety monitoring, pharmacodynamics, and prediction of drug interactions. For example, Karimi et al. (2015) reviewed how to use data mining and related computer science technologies from different data sources (including spontaneous reporting databases, electronic health records, and medical literature) to identify signals of adverse drug reactions in the field of drug safety. Wilson et al. (2003) discussed the potential use of data mining and knowledge discovery for detecting adverse drug events (ADEs) in databases and explored the application of data mining in drug surveillance systems. Harpaz et al. (2012) provided an overview of recent methodological innovations and data sources used to support the discovery and analysis of adverse drug events, emphasizing the importance of data mining techniques in improving drug safety monitoring. 2.3 Optimizing feature selection to improve model accuracy and efficiency In the data mining process of drug development, feature selection is a crucial step that directly affects the accuracy and efficiency of the model. Optimizing feature selection can not only improve the predictive performance of the model, but also simplify the model, reduce computational complexity, and accelerate the drug development process. Feature selection helps to reduce data dimensionality. In drug development, a large amount of biomedical data is usually generated, which may contain many features unrelated to drug activity. By selecting the most important features, redundant and noisy data can be removed, the model can be simplified, computational complexity can be reduced, and the generalization ability of the model can be improved. Optimizing feature selection can improve the predictive accuracy and interpretability of the model. Selecting the most representative features can make the model more focused on factors closely related to drug activity, thereby improving the predictive accuracy of the model. This is crucial for drug screening and drug design, as it can help researchers quickly identify potential candidate drugs; Selecting features with clear biological significance can make the model easier to understand and interpret. This is crucial for decision-making and communication in the drug development process, as it can help researchers and decision-makers better understand the results and meaning of the model. To achieve optimization of feature selection, multiple methods and techniques can be employed. For example, statistical feature selection methods can evaluate the importance of features by calculating their correlation or significance with the target variable. Machine learning algorithms (Cai et al., 2018) such as decision trees, random forests, support vector machines, etc. can also be used for feature selection, selecting the best features by training the model and evaluating the impact of features on model performance. 3 Machine Learning Model Construction 3.1 Basic principles of machine learning in drug screening The basic principle of machine learning in drug screening is to train a model using a large amount of data, enabling the model to automatically learn and recognize features or patterns related to drug activity. These learned features or patterns can be used to predict the biological activity of new compounds (Yang et al., 2019), thereby accelerating the process of drug screening. Specifically, machine learning algorithms learn a mapping relationship or function from known drug data through continuous iteration and optimization, which can map the characteristics of a compound (such as chemical structure, physical properties, etc.) to its biological activity. This mapping relationship is learned through training samples and their corresponding labels (such as active or inactive) in the data.

RkJQdWJsaXNoZXIy MjQ4ODYzMg==