CMB_2025v15n3

Computational Molecular Biology 2025, Vol.15, No.3, 151-159 http://bioscipublisher.com/index.php/cmb 157 7.2 Process design and optimization technology The construction of the workflow is based on the avah★ framework, which is a heterogeneous parallel architecture where both the CPU and GPU are involved in the computing. Instead of stringing all the steps into a straight line, we designed it as an asynchronous scheduling mode, allowing tasks such as comparison, sorting, and mutation calls to run separately. Tasks are allocated through queues to avoid wasting resources due to waiting. When executed concurrently, mutual exclusion control is also added to prevent I/O conflicts caused by multiple nodes accessing the same file simultaneously. Before entering the process, data is split into chromosome or specific region levels, with each node responsible for a part of it. This sharding strategy significantly improves the utilization rate of the cluster. The GPU is mainly responsible for computationally intensive tasks such as GVCF compression and deep learning prediction. The Slurm scheduler is responsible for resource allocation. It automatically detects idle nodes and assigns tasks. Although the entire system seems complex, it is much smoother than the traditional serial process, with significantly reduced waiting time and I/O contention. 7.3 Performance results, lessons learned and future improvement directions Judging from the results, the acceleration effect brought by the GPU version is quite obvious: the speed is increased by approximately 3.67 times in the CloudLab environment, and even reaches 5.05 times in the Fabric environment. The most obvious improvement is in the comparison and initial mutation invocation stages, but not all parts can benefit equally. In CloudLab, the read and write speeds of storage are relatively slow, and the I/O bottleneck still exists; But after Fabric was replaced with a faster NVMe SSD, this problem almost disappeared. The simultaneous operation of multiple Gpus can maintain a high utilization rate, which further verifies the effectiveness of the parallel pipeline design. Overall experience shows that while computing speed is important, the balance between computing and I/O is even more crucial. Asynchronous scheduling can make resource allocation more reasonable, while GPU hybrid acceleration significantly improves the processing efficiency of the deep learning process. Next, we are considering trying AI-driven automatic scheduling to make task allocation smarter. Meanwhile, explore the dynamic expansion of the cluster to enable the number of nodes to adjust according to the load changes. With the gradual popularization of long-read sequencing technology, how to further optimize its performance on heterogeneous clusters will also be the problem to be solved in the next step. 8 Future Development Directions and Conclusions Artificial intelligence is gradually changing the role of high-performance computing in bioinformatics. In the past, people paid more attention to improvements at the algorithmic level, such as how to make comparisons faster and models more stable. Now, the situation is somewhat different - deep learning is beginning to enter the mutation detection process. Tools like DeepVariant have proven that neural networks, after training, can indeed significantly improve detection accuracy. Next, AI may not remain confined to a single model but permeate the entire analysis process, adopting an end-to-end approach from input to output, and leveraging GPU clusters to achieve faster inference. In fact, AI is not only capable of "detection", but is also gradually "managing systems". Some teams have already used reinforcement learning to dynamically adjust task scheduling and cloud resource allocation, enabling the cluster to remain efficient under different loads. Although these practices are not yet mature, the trend is already quite clear - the combination of AI and HPC is making NGS analysis more automatic, smarter and less troublesome. However, new problems brought about by the improvement of computing power have also emerged. Scalability and energy consumption have become realities that cannot be ignored. The scale of sequencing is constantly expanding, and the petabyt-level data is too much for many systems to handle. The problem is not only in computing power; storage, networking, and heat dissipation have all become new bottlenecks. As clusters grow larger and larger, energy consumption is soaring, and energy conservation and heat dissipation have become unavoidable issues. How to complete more computations without increasing power consumption is one of the current research focuses. On the other hand, data management remains an old problem that slows down progress - compression, transmission and access may all become bottlenecks. In a cloud HPC environment, the situation is more complex. Considerations of security and privacy make system design more challenging. To address these issues, researchers are exploring new directions: using scalable file systems, hardware heterogeneous integration (such as FPgas, ASics), and low-power architectures to balance performance and energy efficiency. But to achieve a truly "green HPC", it may still take more time.

RkJQdWJsaXNoZXIy MjQ4ODYzNA==