International Journal of Clinical Case Reports, 2025, Vol.15, No.6, 293-302 http://medscipublisher.com/index.php/ijccr 296 insulin levels, glycated hemoglobin, exercise status, and dietary habits). Some studies also incorporate data from genetic markers, medical images or wearable devices to make the perspective of risk assessment more comprehensive (Dutta et al., 2022; Khokhar et al., 2025). Reasonable screening and cleaning of these features can remove useless interfering information and significantly improve the predictive ability of the model (Kaliappan et al., 2024; Talari et al., 2024). Before modeling, the features are generally adjusted to a uniform and comparable range through steps such as normalization, standardization, and variable transformation to facilitate model training (Abnoosian et al., 2023; Linkon et al., 2024). Feature screening methods such as chi-square test, Fisher's Score, recursive feature elimination, and random forest importance, along with explicable AI tools like SHAP and LIME, can all help identify key features, making doctors more trusting of the model. Studies all suggest that refining feature engineering can significantly improve the accuracy and application scope of the model (Sneha and Gangil, 2019; Kaliappan et al., 2024; Khokhar et al., 2025). 3.3 Data quality and missing value handling methods In actual clinical data, there are often problems such as data missing, abnormal values, and contradictory records. If these problems are not solved, the model's performance will be compromised. Therefore, cleaning data is a fundamental step in building predictive models (Abnoosian et al., 2023; Patro et al., 2023). Common methods for handling missing data include filling in the mean or median, KNN filling, and other model-based filling approaches. Some studies also refer to the opinions of clinical experts to ensure that the results are in line with the actual situation (Altamimi et al., 2024; Xu et al., 2025). In addition to filling the data, data preprocessing also includes deleting duplicate samples, quantifying categorical data, adjusting the range of feature values, and making the data more consistent by detecting outliers. All these operations can make the model perform more stably during training and prediction (Linkon et al., 2024). The actual situation has proved that models trained with high-quality data are more reliable and more suitable for early clinical screening of diabetes than those trained with unprocessed raw data (Patro et al., 2023; Altamimi et al., 2024). 4 Model Accuracy Evaluation Indicators and Key Influencing Factors 4.1 Evaluation indicators for model discrimination and calibration Artificial intelligence predictive models for early diabetes screening typically measure the ability to distinguish between individuals with and without diabetes by the area under the receiver operating characteristic curve (AC-ROC), which reflects the level at which the model correctly identifies positive and negative cases at different thresholds (Figure 2) (Mohsen et al., 2023). Other commonly used discriminant metrics include accuracy, sensitivity (recall rate), specificity, precision and F1 score. Especially when the samples are imbalanced, these metrics can reflect the model performance from different perspectives (Linkon et al., 2024; Moghaddam et al., 2024). Among them, the F1 score integrates accuracy and recall rate, and is particularly important in application scenarios where both false negatives and false positives are costly (Dutta et al., 2022; Elseddawy et al., 2022). Figure 2 Limitations in AI-based T2DM risk prediction models (Adopted from Mohsen et al., 2022)
RkJQdWJsaXNoZXIy MjQ4ODYzNA==