BM_2024v15n1

Bioscience Method 2024, Vol.15, No.1, 28-36 http://bioscipublisher.com/index.php/bm 31 million unlabeled molecules. This intrinsic learning of chemical logics enables the extraction of predictive representations from specific molecular sequences. To validate the proposed method, ten benchmark and thirty-eight virtual screening datasets were considered. Extensive validation showed that the method performed exceptionally well, confirming the capability of self-supervised learning to extract useful information from large-scale unlabeled datasets. Self-supervised frameworks can efficiently utilize a large volume of non-annotated data to compensate for the lack of labeled data, especially in scenarios where data is scarce. A specific self-supervised framework designed for predicting molecular properties, improves the performance of graph neural networks in molecular property prediction by employing multiple pretext tasks across different scales of molecules (atoms, fragments, and whole molecules). These deep learning methods each have their unique advantages and play roles in different aspects of predicting drug molecule activity. By integrating these techniques, researchers can understand and predict the biological activity of molecules from various perspectives, providing powerful tools for the discovery and development of new drugs. 2 Application Cases for Predicting Drug Molecule Activity 2.1 Prediction of molecular properties Deep learning technologies, especially Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs), have been utilized to predict the physicochemical properties of drug molecules, such as solubility, lipophilicity, and molecular weight. These properties are crucial for assessing the pharmacological, toxicological, and pharmacokinetic characteristics of drug molecules. For instance, the ImageMol framework uses self-supervised learning to enhance the accuracy of predicting these molecular properties, demonstrating the potential of deep learning in this field. Studies such as Tang et al. (2020) have established a Self-Attention Message Passing Neural Network (SAMPN) based on the graph neural network framework. This framework, which directly utilizes chemical graphs, effectively improves the prediction accuracy for molecular properties such as lipophilicity and solubility. Additionally, its attention mechanism allows for an intuitive display of each atom's contribution to the molecular properties, aiding researchers in visually understanding the relationship between molecular properties and structure (Tang et al., 2020). Zeng et al. (2022) introduced a self-supervised pre-training deep learning framework called ImageMol, which extracts chemical representations from unlabeled drug samples to predict the molecular targets of candidate compounds. This framework has shown high performance in evaluating molecular properties such as the metabolism, brain permeability, and toxicity of drugs. Wang et al. (2023) proposed a novel multimodal molecular pre-training framework, MolIG, for predicting molecular properties based on images and graph structures. The MolIG model effectively integrates the advantages of molecular graphs and images through self-supervised tasks, capturing key molecular structural features and high-level semantic information to enhance the prediction performance of molecular properties. 2.2 Prediction of drug-target interactions Deep learning models, particularly GNNs, have been applied to predict the interactions between molecules and specific biological targets, which is vital for identifying new drug candidates and understanding their mechanisms of action. By learning from extensive drug-target interaction data, these models can predict the binding affinity of unknown molecules with targets, thereby accelerating the drug screening and optimization process. The development of the structure-based deep convolutional neural network AtomNet was designed to predict the biological activity of small molecules suitable for drug discovery applications. By applying the concept of convolution, AtomNet successfully predicted new active molecules for targets previously lacking known modulators. Moreover, compared to traditional docking methods, AtomNet has demonstrated superior performance on multiple benchmark datasets, achieving an AUC of over 0.9 for 57.8% of targets on the DUDE

RkJQdWJsaXNoZXIy MjQ4ODYzMg==