CMB_2025v15n4

Computational Molecular Biology 2025, Vol.15, No.4, 171-182 http://bioscipublisher.com/index.php/cmb 17 5 relationships. During training, some "fake data" also needs to be fed, such as randomly paired proteins, to help the model learn to distinguish between true interactions and false interactions. After training, proteins of the same type will automatically cluster together, and those with similar functions will also be closer, indicating that the structure and semantics have been embedded. After that, these vectors can still be used - for example, to find similar genes, predict new interactions, and conduct classification analysis. We also try to make the model "clearer", combining path or rule information to make the prediction more interpretable (Hu et al., 2024). Ultimately, this vectorized representation transforms the knowledge graph from merely a stack of information into a knowledge network that can be truly understood by the model. Figure 2 Ontology overview (Adopted from Alocci et al., 2015) Image caption: Overview of the ontology developed for translating glycan structures into RDF/semantic triples. The figure shows all the predicates and the entities used for defining a glycan structures into the RDF triple store (Adopted from Alocci et al., 2015) 4.2 Network topology analysis The molecular interaction knowledge graph is actually a vast network. Just looking at the nodes and lines doesn't make much sense; one has to find ways to read out the structural rules within it. We start with topological analysis and first calculate the importance of nodes, that is, who is more "critical" in the network. Some proteins are highly linked and have a high degree of centrality, often serving as hub molecules. There are also some that have few connections but are in special positions, connecting different modules like Bridges. Once such "bottleneck" proteins malfunction, the entire pathway may be affected. In addition to individual nodes, we also examined the group structure and used community detection algorithms to divide the network into compact small clusters. The results are quite interesting. For instance, some modules are entirely composed of immune-related proteins, while others focus on cell cycle regulation. Path analysis is more like looking for clues in a diagram - two seemingly unrelated molecules can sometimes connect after just two or three steps, and this might be a new regulatory

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==