CMB_2025v15n5

Computational Molecular Biology 2025, Vol.15, No.5, 218-226 http://bioscipublisher.com/index.php/cmb 22 4 class. Even if localization or functional differences are taken into account when constructing negative samples, it is still difficult to avoid treating unknown true positives as negative cases, which may cause noise. Weak supervision, generative models or sample weighting are possible remedies. Another issue is incompleteness. Most pathogenic bacteria interaction data are limited, and the model is prone to overfitting. Cross-species transfer learning can utilize model bacteria data, but species differences can still introduce errors. Experimental verification is also lagging behind, and high-throughput verification techniques still struggle to keep up with the prediction speed. Furthermore, the inconsistent data sources also lead to inconsistent reliability, and a standardized and confidence scoring system is needed. These problems are difficult to solve in the short term, but they have promoted algorithmic innovation and experimental collaboration. Figure 2 Protein-protein interactions characterization learning (Adopted from Muzio et al., 2020) Cross-species generalization and interpretability are new challenges. The migration of models among different bacteria often fails because most of the captured patterns are species-specific. Joint training or introduction of species factors can improve generalization, while large pre-trained models (such as ProtBert) can learn more general features. On the other hand, the "black box" attribute of deep models makes the results hard to understand. Visualizing attention weights or introducing concept vectors can help link predictions with biometric features. Explainable structures such as graph rule networks are also under exploration. Furthermore, future models also need to deal with larger-scale "host-pathogen-microbiota" maps, and algorithm efficiency will become a bottleneck. To enhance generalization and transparency, both computational and experimental improvements are still required. There are mainly two future directions: multi-omics integration and intelligent AI. The integration of transcriptome, metabolome and single-cell data can reveal the spatiotemporal dynamics of interactions, and dynamic graph models are being attempted. The combination of cross-species and host omics will bring predictions closer to the real ecology. In terms of algorithms, new ais such as GAN, diffusion models and

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==