CMB_2025v15n4

Computational Molecular Biology 2025, Vol.15, No.4, 171-182 http://bioscipublisher.com/index.php/cmb 17 8 existing drugs. In education, it is also a useful teaching tool, allowing students to directly observe how signals are transmitted layer by layer. Data engineers can also use it as a graph analysis platform to extract features, calculate network metrics, and even connect to AI systems for question-and-answer. Overall, this atlas can not only be searched, viewed and "thought about", but also will be indispensable in the future, whether in scientific research, clinical practice or teaching (Lu et al., 2025). 6.3 Scalability and interoperability To ensure that this knowledge graph system can survive for a long time, we left as much leeway as possible during the design process. The data part is the most prone to change - new molecular types and new interaction relationships can emerge at any time, so the system is not dead. To add content, simply give the node a new label and import new data. There is no need to change the architecture. The ETL process can also be run repeatedly, just like regularly updating BioGRID data, with the difference parts directly imported increentially. If the data keeps piling up larger and larger, switching to the enterprise version of Neo4j or distributed deployment can also hold up, and the front end doesn't need to modify a single line of code. Functionally, we follow a modular approach. New features can be used as soon as they are installed, just like plugins. I tried adding a relationship prediction module before, and it only took a few lines of interface code to run it successfully. It won't be difficult to add path algorithms or time series analysis later. Intercommunication has also been taken into account. The system supports apis and SPARQL endpoints, and external platforms can directly retrieve data. Each node on the interface can jump to an external database. For example, clicking on a protein will lead to Uniprot. We also tested importing the data into Cytoscape and directly connecting it to NCBI for comparison, and it went quite smoothly. There are also loopholes left in the security aspect, making it convenient to connect to OAuth or manage permissions later. Overall, this system is flexible enough, scalable and can also keep pace with other platforms (Stear et al., 2023; Glen et al., 2025). 7 Challenges and Prospects 7.1 Data heterogeneity and dynamic update issues Although the knowledge graph of molecular interactions has a broad prospect, it also has many troubles. First of all, the data is in a mess, with too many sources and too diverse formats. It includes both structured data and a bunch of literature texts and semi-finished products. Different experiments and database standards operate independently, resulting in varying degrees of reliability. When constructing a map, it is often necessary to manually remove duplicates and unify naming. For instance, in PPI, the same protein may be written with several alternative names, and there may also be duplicate names across species. Multi-omics integration is even more challenging, as data conflicts arise as soon as they are fused. In the future, we will have to rely on smarter algorithms to automatically align and reduce manual patching (Xiao et al., 2023). The second issue is the update problem. Biological research is advancing at an astonishing pace. New discoveries are made every day, and relationships that were valid yesterday may be overturned today. If the graph is not updated in a timely manner, it will lead to the wrong rhythm. Especially for the relationship related to drugs or diseases, if the old information is not revoked, the model analysis will be biased. To solve this problem, it is not only necessary to add new data, but also to be able to remove the old and retain the version. Perhaps a timestamp, confidence level or even an expiration date can be added to each relationship. In addition, automatic text mining is added to enable the system to discover on its own which knowledge should be updated (Hoyt et al., 2019). Over time, redundant data also needs to be cleared to prevent the graph from becoming bloated. Ultimately, for this system to always "survive", it must be able to adapt to changes and evolve on its own. Only in this way can it truly become a breathing warehouse of biomedical knowledge. 7.2 Model interpretability and knowledge uncertainty The "black box" problem of models is particularly prominent in the biomedical field. Algorithms can produce results but cannot explain "why" clearly, which makes it difficult for researchers and doctors to be fully convinced. For instance, the model claims that proteins A and B interact with each other, but if no reason is given - even if it's just a simple statement like "They are both involved in the same pathway" - the conclusion would seem empty. For this reason, many studies have begun to attempt to make models "speak": find paths, draw subgraphs, and add

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==