International Journal of Clinical Case Reports, 2025, Vol.15, No.6, 293-302 http://medscipublisher.com/index.php/ijccr 297 Calibration criteria focus on whether the disease probability predicted by the model is consistent with the actual incidence rate. Only well-calibrated models can provide reliable risk estimates for clinical practice (Mohsen et al., 2023). However, compared with discriminant indicators, there are fewer reports on calibration in the literature. Only some studies have provided calibration graphs, Brier scores, or calibration slopes and intercepts (Germaine et al., 2025). If calibration evaluation is lacking, even if the discriminant performance is high, misleading risk estimates may occur. Therefore, both discriminant and calibration results should be reported simultaneously during the comprehensive evaluation to ensure the usability of the model in clinical practice. 4.2 Model development and validation strategies To ensure that AI diabetes prediction models are both reliable and user-friendly, internal validation methods such as K-fold cross-validation and bootleg are commonly used in research. By repeatedly splitting training and test data, the performance of the model can be judged and the problem of "learning bias" can be reduced (Dutta et al., 2022; Linkon et al., 2024; Khokhar et al., 2025). These practices can more accurately reveal the model's performance on new data and are also the basic requirements during model development. However, internal verification alone cannot prove that it is applicable to different groups of people (Mohsen et al., 2023). External validation refers to testing models on independent datasets from other regions or institutions. Although it is carried out less frequently, it is a key step in evaluating the practical application value. Currently, only a few studies have completed such validation, reflecting that there are still obvious gaps in this field (Mohsen et al., 2023). During the model construction process, it is also necessary to further enhance the prediction performance through reasonable feature selection, hyperparameter tuning, and ensemble learning (Dutta et al., 2022; Abnoosian et al., 2023). Meanwhile, transparent reporting on sample size basis, missing data handling and validation process helps to improve the reproducibility of the model and the credibility of the study (Moghaddam et al., 2024). 4.3 Technical factors: sample size, class imbalance, overfitting Technical issues such as whether the sample size is sufficient, whether the data categories are biased, and whether the model will "learn bias" directly affect the stability and reliability of AI models when conducting diabetes screening. When processing a large amount of feature data with complex algorithms, having a sufficient number of samples is a key condition for avoiding model "learning bias" and obtaining reliable results. Studies with small sample sizes often exaggerate the model effect, but they cannot be widely used in practice (Moghaddam et al., 2024). Simulation experiments show that to ensure the stability of model performance, each candidate predictor requires at least approximately 200 outcome events to support it (Mohsen et al., 2023; Wang, 2025). Class imbalance is a common problem in diabetes datasets, where the number of non-diabetic samples far exceeds that of diabetic samples, which can lead the model to favor the majority class and reduce its ability to recognize true positivity (Elseddawy et al., 2022; Khokhar et al., 2025). To alleviate this problem, synthetic minority class oversampling methods such as SMOTE, random oversampling, and cost-sensitive learning strategies are often adopted in research to improve the prediction performance of minority classes (El-Bashbishy and El-Bakry, 2024; Talari et al., 2024; Jang, 2025; Malik and Tepe, 2025). Overfitting is more common in cases where the sample size is small or the class imbalance is severe. It can be alleviated to a certain extent through regularization, cross-validation, and external validation, and help the model maintain reasonable predictive ability on new data (Dutta et al., 2022; Linkon et al., 2024). 5 Clinical Interpretability and Practical Implementability 5.1 The interpretability of the model and the acceptance of medical staff The prerequisite for AI prediction tools to be used in the early screening of diabetes is that the model can be understood clearly by people. Doctors should not only know "what the result is", but also be clear about "which information led to this result". So nowadays, many models will use methods such as SHAP, LIME and attention mechanism to clarify the key information and help doctors make judgments (Khokhar et al., 2025; Kiran et al., 2025). Coupled with interactive visualization tools, doctors can see the importance of various pieces of information and simulate different "hypothetical situations", making it easier to understand and use the model results (Hasan et al., 2024).
RkJQdWJsaXNoZXIy MjQ4ODYzNA==