CMB_2024v14n3

Computational Molecular Biology 2024, Vol.14, No.3, 106-114 http://bioscipublisher.com/index.php/cmb 108 3 Applications of Machine Learning in Genomic Data Analysis 3.1 Sequence alignment and assembly 3.1.1 Traditional methods vs. AI-enhanced techniques Traditional methods for sequence alignment and assembly, such as Burrows-Wheeler Alignment (BWA) and Genome Analysis ToolKit (GATK), have been foundational in genomic research. However, these methods often struggle with the massive volume and complexity of next-generation sequencing data. AI-enhanced techniques, such as those implemented in Findmap and Findvar, offer significant improvements. Findmap, for instance, integrates known variant locations during alignment, which enhances both speed and accuracy compared to traditional methods like BWA and SNAP. Similarly, GotCloud employs machine learning for efficient variant calling and genotyping, automating several steps and reducing computational resource requirements (Jun et al., 2015). 3.1.2 Accuracy and efficiency improvements AI-enhanced techniques have demonstrated substantial improvements in both accuracy and efficiency. For example, Findmap correctly mapped 92.9% of reads, outperforming traditional methods like BWA (90.5%) and SNAP (92.6%). Additionally, Findvar showed high accuracy in calling single nucleotide variants (99.8%), insertions (79%), and deletions (67%), surpassing traditional tools like SAMtools in certain aspects. The graph genome reference implementation also enhances read mapping sensitivity and variant calling accuracy, achieving a 0.5% increase in recall without compromising specificity (Rakocevic et al., 2019). 3.1.3 Case studies in genomic assembly Several case studies highlight the practical applications of AI in genomic assembly. For instance, the 1000 Bull Genomes Project utilized Findmap and Findvar to process large datasets efficiently, significantly reducing clock times compared to traditional methods (VanRaden et al., 2019). Another example is the use of GotCloud in the 1000 Genomes Project and the NHLBI Exome Sequencing Project, where it effectively filtered false positives and detected true variants with high power (Jun et al., 2015). These case studies underscore the potential of AI-enhanced techniques to handle large-scale genomic data more effectively. 3.2 Variant calling and mutation analysis Machine learning has revolutionized variant calling and mutation analysis by improving the accuracy and efficiency of detecting genetic variants. Tools like GotCloud and Findvar leverage machine learning to automate variant calling, filter artifacts, and refine genotypes, thereby enhancing the reliability of genomic data analysis (Zou et al., 2018). Deep learning methods have also been applied to predict the effects of genetic variants on gene expression, further advancing our understanding of genomic variations. 3.3 Gene expression profiling Gene expression profiling has benefited significantly from machine learning, particularly deep learning techniques. The Enformer model, for example, integrates long-range interactions in the genome to predict gene expression with high accuracy. This model has outperformed traditional methods in predicting the effects of noncoding variants on gene expression, demonstrating the potential of deep learning to enhance our understanding of gene regulation (Avsec et al., 2021). Deep learning frameworks have been applied to various aspects of gene expression analysis, including the identification of sequence motifs and promoter-enhancer interactions (Talukder et al., 2020; Routhier and Mozziconacci, 2022). 4 Machine Learning in Functional Genomics 4.1 Predicting gene function Machine learning has become an indispensable tool in predicting gene function, leveraging vast amounts of genomic data to uncover insights that traditional methods might miss. For instance, machine learning algorithms have been employed to integrate heterogeneous data and detect patterns that are not easily discernible through rule-based approaches. This has been particularly useful in plant genomics, where predicting gene function and organismal phenotypes remains a significant challenge (Leung et al., 2016). Deep learning models have shown promise in predicting the structure and function of genomic elements, such as promoters and enhancers, which are crucial for understanding gene expression levels (Liu et al., 2020).

RkJQdWJsaXNoZXIy MjQ4ODYzNA==