Computational Molecular Biology 2024, Vol.14, No.3, 106-114 http://bioscipublisher.com/index.php/cmb 107 output. This approach is widely used in genomics for tasks such as variant calling, gene expression prediction, and classification of genomic sequences. For instance, supervised learning techniques have been applied to annotate sequence elements and analyze epigenetic, proteomic, and metabolomic data (Libbrecht and Noble, 2015). These methods are particularly effective in scenarios where a large amount of labeled data is available, allowing the model to learn the mapping from inputs to outputs accurately. 2.2 Unsupervised learning Unsupervised learning techniques are used to identify patterns and structures in data without the need for labeled outputs. In genomics, these methods are often employed for clustering and dimensionality reduction tasks. Clustering approaches, such as hierarchical, centroid-based, and density-based methods, help in understanding the natural structure inherent in genomic data, such as gene expression profiles and cellular processes (Figure 1) (Karim et al., 2020). Unsupervised learning is crucial for exploratory data analysis, where the goal is to uncover hidden patterns and relationships within the data. This image illustrates the use of a convolutional autoencoder for unsupervised learning to perform clustering analysis on microscope images. Clustering analysis is conducted after image processing, utilizing clustering algorithms such as K-means to group the feature space. This approach helps uncover hidden patterns and relationships within the data, such as distinct clusters of gene expression or differences between cell types. To enhance clustering performance, the network jointly optimizes both the reconstruction loss and the Cluster Assignment Hardening (CAH) loss, refining the clustering results by continuously adjusting the network parameters. This application of unsupervised learning in genomics is particularly suited for exploratory data analysis. Through clustering methods, it can help us understand the intrinsic natural structure of genomic data, thereby revealing hidden patterns in gene function and cellular processes. Figure 1 Schematic representation of a VAE used for clustering GE data (Adopted from Karim et al., 2020) 2.3 Deep learning and neural networks Deep learning, a subset of machine learning, has revolutionized the analysis of genomic data by leveraging multilayered artificial neural networks (ANNs) to model complex patterns. Deep learning techniques, including convolutional neural networks (CNNs) and deep neural networks (DNNs), have shown remarkable success in various genomic applications. These methods are particularly adept at handling high-dimensional data and have been used to predict the structure and function of genomic elements, such as promoters and enhancers (Li, 2018; Liu et al., 2020; Schmidt and Hildebrandt, 2020). Deep learning models have also been applied to next-generation sequencing (NGS) data for tasks such as variant calling, metagenomic classification, and genomic feature detection. Despite their success, one of the challenges with deep learning models is their interpretability. Efforts are being made to develop methods for interpreting the predictions of DNNs to better understand the underlying molecular and cellular mechanisms (Talukder et al., 2020).
RkJQdWJsaXNoZXIy MjQ4ODYzNA==