CMB_2025v15n6

Computational Molecular Biology 2025, Vol.15, No.6, 291-298 http://bioscipublisher.com/index.php/cmb 295 Figure 2 Test dataset labeled positive and negative used in testing the model on data from 2021 (Left). Predicted AIV outbreaks binned into True positives (Actual positive outbreaks predicted), True negatives (Actual negative outbreaks predicted), False positives (Negative outbreaks predicted as positives) and False negatives (Positive outbreaks predicted as negatives) to the right based on the test data from 2021 (Adopted from Opata et al., 2025) 6 Challenges and Limitations 6.1 Data imbalance and cross-regional data sharing barriers Before discussing model performance, many studies have actually pointed out a fundamental issue long ago: the available outbreak data itself is not balanced enough. Although many AI models rely on historical samples, these samples are often concentrated in only a few regions or specific disease types, and the quantity is also limited, resulting in their poor performance in areas lacking data. In some places, although data has been accumulated, legal, institutional or technical restrictions make cross-regional sharing difficult, which also makes it less realistic to build more comprehensive data sets. Multi-source information such as genomic, environmental and epidemiological information is thus difficult to truly integrate, and the adaptability and predictive ability of the model will naturally be constrained (Ezanno et al., 2021; Keshavamurthy et al., 2022).

RkJQdWJsaXNoZXIy MjQ4ODYzNA==