Computational Molecular Biology 2024, Vol.14, No.4, 155-162 http://bioscipublisher.com/index.php/cmb 157 alignment and sorting of high-throughput sequencing data with significant speed-ups, ensuring reproducibility and scalability (Jarlier et al., 2020). Additionally, the integration of distributed computing with GPU-based devices has shown promising results in drug discovery applications, providing cost-effective and scalable solutions (Merelli et al., 2014). 3.1.2 Cluster and supercomputing applications Cluster and supercomputing applications are pivotal in genomics for tasks that require immense computational power. These systems utilize a network of interconnected computers to perform complex calculations at high speeds. For example, the use of multicore clusters and supercomputers has been demonstrated to improve the efficiency of distance matrix computations and sequence alignment tasks, which are fundamental in multiple sequence alignment and systems biology (Yelick et al., 2020). Moreover, the application of high-level parallel programming patterns, such as the master-worker FastFlow pattern, has been shown to enhance the performance of widely used alignment tools like Bowtie2 and BWA (Merelli et al., 2014). 3.2 Accelerating sequence alignment and assembly Sequence alignment and assembly are critical steps in genomic analysis that benefit greatly from HPC. The development of specialized algorithms and hardware accelerators has led to significant improvements in these processes. For instance, the use of algorithm-architecture co-design has been proposed to accelerate genome analysis, integrating multiple steps of the analysis pipeline to reduce data movement and energy consumption (Mutlu and Firtina, 2023). Additionally, the implementation of parallel computing models for genome sequence alignment and preprocessing has been shown to enhance computing efficiency and scalability (Zou et al., 2021). 3.3 Role of GPUs and FPGAs in genomics Graphics Processing Units (GPUs) and Field-Programmable Gate Arrays (FPGAs) play a crucial role in accelerating genomic computations. These hardware accelerators are designed to handle parallel tasks efficiently, making them ideal for the computationally intensive tasks in genomics. GPUs have been successfully used to accelerate various genomic applications, such as the exploration of perturbed conditions in biological systems and the simulation of reaction-diffusion systems (Xu, 2020). The use of GPUs in deterministic systems biology simulators has achieved remarkable speed-ups, demonstrating their potential in large-scale genomic simulations. Similarly, FPGAs have been employed to enhance the performance of genome analysis pipelines, providing fast and accurate results with lower power consumption (Ward et al., 2013). 4 Overcoming Data Analysis Bottlenecks The rapid advancement in genomics has led to an unprecedented increase in the volume and complexity of data generated. This surge has created significant bottlenecks in data analysis, necessitating the development of advanced tools and methodologies to manage and interpret large-scale genomic datasets efficiently. High-performance computing (HPC) has emerged as a critical solution to these challenges, enabling the processing of vast amounts of data in a timely and scalable manner. 4.1 Tools and pipelines for large-scale genomic analysis The development of specialized tools and pipelines is essential for the efficient analysis of large-scale genomic data. Tools like DISSECT have been designed to leverage distributed-memory parallel computational architectures, significantly reducing the time required for complex genomic analyses. For instance, DISSECT can analyze simulated traits from 470 000 individuals in approximately four hours using 8 400 processor cores, achieving high prediction accuracies (Canela‐Xandri et al., 2015). Similarly, platforms like DolphinNext offer a modular approach to building and deploying complex workflow (Figure 1), ensuring flexibility, portability, and reproducibility in high-throughput data processing (Yukselen et al., 2019; Yukselen et al., 2020). These tools address the need for scalable and efficient data processing frameworks, which are crucial for handling the growing volume of genomic data.
RkJQdWJsaXNoZXIy MjQ4ODYzNA==