Computational Molecular Biology 2025, Vol.15, No.6, 291-298 http://bioscipublisher.com/index.php/cmb 292 2 Genomic Data Acquisition in Animal Disease Surveillance 2.1 Whole-genome sequencing (WGS) of pathogens and real-time tracking platforms In animal disease surveillance, many tasks often start with whole genome sequencing (WGS), as this method can provide very detailed genetic information, which can be used to identify the source of infection, determine changes in pathogen populations, and even track transmission chains. Although traditional post-culture sequencing has been in use for many years, it always encounters some problems, such as long time consumption, high biosafety requirements, and it is also difficult to obtain pure enough DNA once there is mixed infection or contamination in the sample. In recent years, the emergence of non-culture long-read sequencing technologies has changed this. For instance, the adaptive sampling of Oxford nanopores enables direct measurement of tissue samples without culture, which is faster and the quality can be maintained within an acceptable range. Current real-time tracking platforms connect WGS data with automated bioinformatics processes, enabling rapid completion of typing, clustering, and phylogenetic analysis, making it easier to identify epidemic outbreaks early (Bautista et al., 2023; Ghielmetti et al., 2023; Knijn et al., 2023). 2.2 Application of metagenomic and microbiome data in outbreak early warning In terms of epidemic early warning, metagenomic sequencing is often used to observe the complete microbial community in animals. It can simultaneously detect known pathogens and also capture novel viruses that are usually not easily noticed. Previous studies have relied on this method to identify long-term latent viral infections in wild animals and also to find bacterial species related to respiratory diseases, indicating that its value in early warning is not only theoretical. By leveraging publicly available sequencing data, metagenomics can also identify emerging viruses with potential zoonotic risks, providing clues for proactive monitoring. After aggregating metagenomic data from different sources, researchers can more clearly observe multiple infections and changes in the ecological structure of pathogens, and this information is particularly crucial for epidemic risk assessment (Kawasaki et al., 2021; Prentice et al., 2024). 2.3 Sample sources and standardized data processing pipelines In the actual process of conducting genomic monitoring, the types of samples are often more complex than imagined, including not only culture isolates but also tissue biopsies, environmental samples, and public sequencing banks. Problems often arise here as well, such as too low DNA content, severe contamination or the presence of mixed infections, all of which may affect the sequencing effect. Therefore, optimizing the sample processing procedure and sequencing strategy is particularly important. To enable data obtained from different laboratories to be compared with each other, standardized processes usually incorporate whole genome amplification, strict quality control and computational analysis. Meanwhile, if the monitoring and diagnostic procedures can be agreed upon among different institutions and follow the FAIR data principles, it will be more conducive to cross-departmental sharing and integration, thereby supporting the advancement of the "One Health" framework for animal diseases (Stärk et al., 2019; Pinto et al., 2024; Struelens et al., 2024). 3 AI Frameworks for Outbreak Prediction 3.1 Supervised learning and time-series modeling (e.g., LSTM, random forest) When dealing with epidemic data, researchers often take time variations into account, so time series models like LSTM are frequently employed. It is good at capturing the correlations between sequences and is also relatively adaptable to complex nonlinear changes, so it performs well in predicting the trend of infection. The advantages of random forests, however, are somewhat different. They enhance stability through the combination of multiple decision trees and are less sensitive to common noise or heterogeneous variables in monitoring data. Both types of methods rely on diverse data for training, such as the number of cases, demographic data, and various environmental factors, which enables the model to detect some signs of an outbreak in the early stage of the epidemic. However, in different studies, the performance of the algorithm may also vary, specifically depending on the data characteristics and the target task (Ardabili et al., 2020; Alwakeel, 2025; Gao et al., 2025). 3.2 Deep learning and graph neural networks for transmission pathway modeling When simulating transmission paths, some teams prefer to use graph structures to represent hosts, pathogens, and
RkJQdWJsaXNoZXIy MjQ4ODYzNA==