CMB_2025v15n3

Computational Molecular Biology 2025, Vol.15, No.3, 141-150 http://bioscipublisher.com/index.php/cmb 149 reproducing or fine-tuning such models (Wang et al., 2025). In addition, during the reasoning stage, the prediction of extremely long genomic sequences is also limited by video memory and time. How to improve the computational efficiency of the model and lower the hardware threshold is an urgent problem to be solved for the large-scale application of pre-trained models in bioinformatics. 7 Conclusion Pre-trained language models have achieved several significant breakthroughs in life sciences. From protein structure prediction to variation effect analysis, such models have demonstrated that deep biological patterns can be decoded merely based on sequence data without the need for artificial feature engineering. They make up for the deficiencies of traditional methods in global context modeling, and have elevated the performance of many biological sequence analysis tasks to a new level. To consolidate the role of pre-trained models, it is necessary to establish unified evaluation benchmarks and open data. Through objective comparisons on public datasets, the shortcomings of the model can be identified and continuously improved. Meanwhile, sharing large-scale and high-quality sequencing and experimental data will help train more robust models. Continuous benchmark evaluations and dataset construction can ensure that the progress in this field is stable and reliable. Looking to the future, the integration of artificial intelligence and molecular biology will become increasingly close. Large pre-trained models are expected to become daily tools in biological research: from designing new enzymes and new drugs to real-time monitoring of pathogen evolution, these models will be involved in every aspect of scientific discovery. With the coordinated development of algorithms and experiments, we will step into a new era of molecular biology empowered by AI, revealing the laws of life at a deeper level and accelerating innovation. Acknowledgments The author extends sincere thanks to two anonymous peer reviewers for their invaluable feedback on the manuscript. Conflict of Interest Disclosure The author affirms that this research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest. References Chia S.E., and Lee N.K., 2022, Comparisons of DNA sequence representation methods for deep learning modelling, In: 2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), IEEE, pp.1-6. https://doi.org/10.1109/IICAIET55139.2022.9936754 Chu S.K.S., Narang K., and Siegel J.B., 2024, Protein stability prediction by fine-tuning a protein language model on a mega-scale dataset, PLOS Computational Biology, 20(7): e1012248. https://doi.org/10.1371/journal.pcbi.1012248 Esmaeeli M., Bauzá A., and Acharya A., 2023, Structural predictions of protein-DNA binding with MELD-DNA, Nucleic Acids Research, 51(4): 1625-1636. https://doi.org/10.1093/nar/gkad013 Fang G., Zeng F., Li X., and Yao L., 2021, Word2vec based deep learning network for DNA N4-methylcytosine sites identification, Procedia Computer Science, 187: 270-277. https://doi.org/10.1016/j.procs.2021.04.062 Gupta Y.M., Kirana S.N., and Homchan S., 2024, Representing DNA for machine learning algorithms: a primer, Biochemistry and Molecular Biology Education, 53(2): 142-146. https://doi.org/10.1002/bmb.21870 He W., Zhou H., Zuo Y., Bai Y., and Guo F., 2024, MuSE: a deep learning model based on multi-feature fusion for super-enhancer prediction, Computational Biology and Chemistry, 113: 108282. https://doi.org/10.1016/j.compbiolchem.2024.108282 Helaly M., Rady S., and Aref M., 2020, Deep learning for taxonomic classification of biological bacterial sequences, In: Machine learning and big data analytics paradigms: analysis, applications and challenges, Springer International Publishing, pp.393-413. https://doi.org/10.1007/978-3-030-59338-4_20 Kalyan K.S., Rajasekharan A., and Sangeetha S., 2021, AMMUS: A survey of transformer-based pretrained models in natural language processing, arXiv Preprint, 2021: 103982. https://doi.org/10.1016/j.jbi.2021.103982

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==