BM_2024v15n1

Bioscience Method 2024, Vol.15, No.1, 28-36 http://bioscipublisher.com/index.php/bm 33 Popova et al. (2017) introduced a novel computational strategy, ReLeaSE (Reinforcement Learning for Structural Evolution), which combines generative models with predictive models to generate new chemical structures aimed at compounds with desired physical and/or biological properties. This method can produce chemically feasible molecules and predict the required properties of newly generated compounds, aiding in the design of chemical libraries with specific physical properties such as melting points or hydrophobicity, or targeting specific biological markers like inhibitors against Janus kinase 2. Yu and Welch (2021) developed MichiGAN, a novel neural network that combines the advantages of VAEs and GANs to sample single-cell RNA-seq datasets. This method allows manipulation of semantically distinct aspects of cell identity, predicting single-cell gene expression responses to drug treatments. Through these application cases, we see the tremendous potential and diverse applications of deep learning technology in predicting drug molecule activity and drug design, showcasing the immense potential of artificial intelligence in modern pharmaceutical research. These technologies not only enable more accurate prediction of molecular pharmacological properties but also allow for innovation in drug design stages with unprecedented speed and efficiency. As deep learning algorithms and computational capabilities continue to advance, we can anticipate further breakthroughs in the fields of drug discovery and molecular design. 3 Challenges and Limitations Despite the significant potential demonstrated by deep learning in predicting drug molecule activity, there are still challenges related to data quality and availability, model interpretability, generalization ability, and computational resource demands that need to be addressed through continuous research and technological innovation. These challenges involve not only improvements to the data and models themselves but also a deeper understanding, evaluation, and enhancement of existing methods. 3.1 Data quality and availability In the process of using deep learning to predict drug molecule activity, the quality, size, and diversity of datasets are key factors that affect model performance. High-quality datasets are essential for building accurate predictive models; however, many publicly available chemical and biological datasets suffer from issues of mislabeling, incompleteness, and insufficient updates (Jiménez-Luna et al., 2020). Additionally, data diversity is crucial for enhancing model generalization capabilities, but acquiring broad and varied data is often challenging, especially for rare or novel compound categories (Cai et al., 2020). Therefore, researchers need to invest significant efforts in data cleaning and preprocessing to enhance data quality and continually seek new data sources to increase dataset diversity. 3.2 Model interpretability Deep learning models, especially complex neural networks, are often seen as "black boxes" because their decision-making processes are difficult to interpret (Li et al., 2021). This characteristic is particularly problematic in scientific research and clinical applications, where decisions often require clear explanations and justifications. The lack of model interpretability limits the application of deep learning models in drug discovery, as researchers and clinicians need to understand the basis of model predictions to make informed decisions. Although recent years have seen some techniques aimed at improving model interpretability, this remains an active area of research that needs further exploration and innovation. 3.3 Generalization ability The generalization ability of deep learning models, or their capability to predict unseen data, is a critical metric for evaluating their performance. In scenarios of predicting drug molecule activity, models need to accurately predict the activity of molecules across different chemical spaces and biological environments. However, due to the vast and complex nature of chemical space, models might perform well on training sets but poorly on new, unseen molecules (Liu et al., 2019). Enhancing model generalization requires training with high-quality and diverse datasets, as well as employing advanced model architectures and regularization techniques to prevent overfitting.

RkJQdWJsaXNoZXIy MjQ4ODYzMg==