CMB_2025v15n5

Computational Molecular Biology 2025, Vol.15, No.5, 218-226 http://bioscipublisher.com/index.php/cmb 22 3 graph-based approaches. A Siamese-structured CNN was used to process sequences and extract local and long-range features, followed by GraphSAGE to integrate known interaction network information. The concatenated outputs of both modules were passed through a fully connected layer to predict interaction probability (Zhong et al., 2022). Training adopted a 1:1 ratio of positive and negative samples, with 5-fold cross-validation for parameter optimization. Dropout and L2 regularization were added to prevent overfitting, and the loss function was weighted to enhance sensitivity to false negatives. The model achieved an AUC of 0.92, outperforming CNN-only (0.85) and SVM (≈0.75) models, with an F1-score of 0.84. Visualization with Grad-CAM revealed high attention weights around known binding motifs such as the arginine-rich region of Ef-Tu, aligning with experimental observations (Zhao et al., 2023). Further domain-focused attention confirmed that high-confidence interactions often occur within conserved structural regions (Charih et al., 2025). Overall, this CNN+GNN hybrid framework effectively captures Salmonella’s protein interaction characteristics and demonstrates strong generalization capacity. 7.2 Model prediction results and experimental verification The predicted network contained approximately 8,000 high-confidence interactions. Combined with known data, the full network comprised about 1,200 nodes and 8,500 edges, displaying a typical scale-free topology (Figure 2) (Muzio et al., 2020). Core hubs included ribosomal subunits and RNA polymerase components, consistent with essential metabolic functions. Module analysis revealed three main clusters: a flagellar assembly module, a Type III secretion system (T3SS) module, and a core metabolic module, interconnected by a few regulatory proteins (Yang et al., 2020). For instance, HilA may bridge the T3SS and metabolic pathways, suggesting a coordination between virulence and metabolism. About 60% of predicted interactions were novel. From a network perspective, the coupling between the flagellar and T3SS modules reveals that Salmonella’s motility and invasion are co-regulated. Meanwhile, plasmid-encoded proteins form largely independent submodules, supporting the notion that virulence factors often operate autonomously. Altogether, the CNN+GNN model not only recovered known interactions but also uncovered biologically meaningful new links that were experimentally verified, offering novel insights into pathogenic system organization. 7.3 Implications of the results for the study of the pathogenic mechanism of salmonella These findings shed light on Salmonella’s pathogenic mechanism. Virulence is not an isolated function but part of a dynamic interaction network where motility, secretion, and metabolism are intertwined. The observed coupling between flagellar and T3SS modules indicates that Salmonella balances energy expenditure and infection efficiency through coordinated protein interactions. The model also helped assign potential functions to previously uncharacterized proteins — for instance, protein X may regulate drug resistance by modulating TopoI activity (Charih et al., 2025). Such predictions accelerate functional annotation of hypothetical bacterial genes. Moreover, the identified interactions themselves could serve as therapeutic targets: disrupting SpiC-FlhB or TopoI-X interactions could attenuate virulence or enhance antibiotic susceptibility. Methodologically, the CNN+GNN framework is generalizable and can be extended to other pathogens, providing computational completion for species lacking experimental interactome data. With further experimental validation, such integrative models are poised to become vital tools in pathogenic systems biology, bridging computational prediction and empirical verification for a holistic understanding of bacterial infection mechanisms (Pancino et al., 2024). 8 Challenges and Future Prospects The prediction of PPI for pathogenic bacteria is still limited by data. The biggest problem is sample imbalance: there are few real interactions and many non-interactions, and the model is prone to bias towards the negative

RkJQdWJsaXNoZXIy MjQ4ODYzNA==