Computational Molecular Biology 2025, Vol.15, No.5, 218-226 http://bioscipublisher.com/index.php/cmb 21 8 Review Article Open Access Machine Learning Approaches in Predicting Protein-Protein Interactions in Pathogenic Bacteria Xing Zhao, Ming Li, Congbiao You Tropical Microbial Resources Research Center, Hainan Institute of Tropical Agricultural Resources, Sanya, 572025, Hainan, China Corresponding author: congbiao.you@hitar.org Computational Molecular Biology, 2025, Vol.15, No.5 doi: 10.5376/cmb.2025.15.0021 Received: 03 Jul., 2025 Accepted: 11 Aug., 2025 Published: 05 Sep., 2025 Copyright © 2025 Zhao et al., This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.6 Preferred citation for this article: Zhao X., Li M., and You C.B., 2025, Machine learning approaches in predicting protein-protein interactions in pathogenic bacteria, Computational Molecular Biology, 15(5): 218-226 (doi: 10.5376/cmb.2025.15.0021) Abstract The protein-protein Interactions (PPI) network of pathogenic bacteria plays a significant role in the pathogenic mechanism of bacteria and the development of drug resistance, and it is a key entry point for systems biology and new drug research and development. However, traditional PPI prediction methods (such as yeast two-hybrid and co-immunoprecipitation, etc.) have limitations such as high cost, long cycle, limited coverage, and the results are easily disturbed by noise. In recent years, the rise of machine learning, especially deep learning, has brought revolutionary progress to PPI research. With its powerful nonlinear modeling and automatic feature extraction capabilities, it has broken through the bottleneck of manual feature engineering. This paper reviews the application progress of machine learning techniques in predicting protein-protein interactions of pathogenic bacteria, with a focus on how supervised, unsupervised and deep learning methods overcome the limitations of traditional methods and improve prediction performance. Meanwhile, we discuss the impact of data preprocessing and feature engineering strategies on the model, summarize the construction and evaluation methods of machine learning models, as well as the application achievements of these models in revealing antibiotic resistance mechanisms, vaccine target screening, cross-species interactions, and other aspects. Through a case study of deep learning prediction in a Salmonella protein-protein interaction network, we verified the effectiveness and biological significance of deep learning models, and looked forward to the current challenges and future development directions. Keywords Pathogenic bacteria; Protein-protein interactions; Machine learning; Deep learning; Graph neural network 1 Introduction Pathogenic bacteria rely on a complete protein-protein interaction system when infecting their hosts. These PPIs determine virulence, metabolic regulation and immune evasion ability. The significance of studying the interaction network does not lie in the role of individual proteins, but in revealing the synergistic relationship of the entire pathogenic system. Like Salmonella, Mycobacterium tuberculosis, etc., their networks often have a "scale-free" and "small-world" structure, with a few hub proteins undertaking key functions. Once disrupted, the entire system will be affected (Humphreys et al., 2024). This enables PPI analysis to not only reveal biological laws but also provide new targets for the design of antibacterial drugs and vaccines. Traditionally, protein interactions have mainly been verified through experiments, such as yeast two-hybrid, TAP-MS or protein chips. However, these methods have problems such as high false positives in pathogenic bacteria, low recognition rate of membrane proteins, and limited throughput (Ding and Kihara, 2018). Building a complete interaction group is often costly and time-consuming, making it difficult to respond quickly to new pathogenic bacteria. Thus, computational prediction gradually replaced experimental screening as the mainstream. The rise of machine learning has completely transformed the way research is conducted. Early methods relied on manual features, such as amino acid composition and domain co-occurrence, and used SVM or random forest prediction, which were accurate but limited by human experience. Deep learning can directly learn features from sequences. The PIPR model achieves sequence-level prediction by using residual convolutional networks, and DPPI increases the AUC to above 0.8 by combining PSSM and CNN. These achievements demonstrate that even with scarce data, cross-species prediction can still be achieved with the aid of transfer learning or pre-trained models. Nowadays, machine learning enables researchers to integrate sequence, structure and functional
RkJQdWJsaXNoZXIy MjQ4ODYzNA==