BM_2024v15n1

Bioscience Method 2024, Vol.15, No.1, 28-36 http://bioscipublisher.com/index.php/bm 30 generators for automated drug design. By using transfer learning, an initial model is trained on a large generic molecular dataset to learn the general syntax of SMILES, and then fine-tuned on a smaller set of molecules, which enhances the effectiveness of molecule generation. This approach reduces the need for extensive post-screening and minimizes selection bias in drug candidate identification. Kayama et al. (2021) utilized Recurrent Neural Networks (RNNs) to predict the success rates of PCR amplification for specific primer sets and DNA templates. This indicates that RNNs can learn the relationships between primers and template sequences and use this knowledge to predict outcomes of chemical reactions, suggesting a potential for RNNs to predict synthetic pathways for drug molecules. Although this study is not directly related to the synthesis of drug molecules, it demonstrates the potential of RNNs in predicting chemical reaction outcomes, which could accelerate the drug development process. 1.3 Graph neural networks (GNN) Graph Neural Networks (GNN) have gained significant attention in the field of drug discovery in recent years. Unlike CNNs and RNNs, GNNs are specifically designed to process graph data, making them particularly suitable for molecular structure analysis, as molecules can be naturally represented as graphs—with atoms as nodes and chemical bonds as edges. GNNs capture complex interactions between nodes by updating the state of nodes, allowing the model to learn the overall structural information of molecules and interactions between atoms. This approach has shown high accuracy in predicting molecular activity, especially when considering the three-dimensional structure of molecules. Wieder et al. (2021) introduced a new GNN architecture called Directed Edge Graph Isomorphism Network (D-GIN), which is composed of two different sub-architectures and can improve the accuracy of predicting the lipophilicity and solubility of molecules. They argue that combining models of different key aspects can make graph neural networks more insightful while enhancing their predictive ability. Xiong et al. (2021) discussed the integration of artificial intelligence technologies, especially Graph Neural Networks (GNNs), in the field of new drug design. They introduced the applications of GNNs in new drug design from three main perspectives: molecular scoring, molecule generation and optimization, and synthesis planning. The goal of new drug design is to create new chemical entities with desired biological activity and pharmacokinetic properties. Furthermore, the study pointed out that data-driven methods have rapidly gained popularity in drug design in recent years, with GNNs receiving wide attention due to their effective processing of graph-structured data. Low et al. (2022) proposed a GNN for predicting the Gibbs free energy of molecular dissolution (ΔGsolv), which, in addition to encoding typical atom and bond-level features, also incorporated chemically intuitive solvent-related parameters, such as semi-empirical local atomic charges and solvent dielectric constants. This work allows for the examination of interactions that enhance or reduce solubility through visualization of the learned model. 1.4 Self-supervised learning Self-supervised learning is a machine learning technique that learns data representations without the need for externally annotated data. In drug discovery, self-supervised learning is employed to learn effective molecular representations from unlabeled molecular data. Through self-supervised learning frameworks, models are capable of learning general representations of molecules by predicting certain internal features of the molecules, such as parts of the molecule or its chemical properties. This method allows researchers to utilize a vast amount of unannotated compound data, thereby eliminating the dependence on expensive or hard-to-obtain labeled datasets. This way, self-supervised learning helps improve the generalization capability of models, enabling them to better predict the activity of new molecules. Chen et al. (2021) developed a self-supervised learning method that pre-trained models from over seven hundred

RkJQdWJsaXNoZXIy MjQ4ODYzMg==