IJCCR_2025v15n5

International Journal of Clinical Case Reports, 2025, Vol.15, No.5, 209-218 http://medscipublisher.com/index.php/ijccr 210 2 Theoretical Basis and Literature Review 2.1 The main theoretical framework and development of chronic disease prediction The formation of chronic disease prediction models mainly relies on two forces: one is data-driven, and the other is theoretical guidance. Early work mostly used statistical methods such as logistic regression and Cox proportional hazards to estimate the risk of disease based on clinical and demographic data (Riley et al., 2016). With the development of big data, machine learning and deep learning have gradually become mainstream. They can handle high-dimensional information from multiple sources and achieve higher accuracy and better scalability (Ngiam and Khor, 2019). At the same time, theoretical frameworks such as social ecology and multi-factor were introduced to guide data selection and integration, so that the results could better reflect the health influencing factors at the individual and group levels. The latest research has proposed a way of combining theory with data, that is, integrating top-down theoretical guidance with bottom-up data mining. This method can help identify new risk factors and establish a multi-level prediction model that considers genetic, clinical, behavioral and environmental influences simultaneously (Prosperi et al., 2018). The combination of such frameworks helps to form more stable and scalable models, providing support for clinical and public health. 2.2 Common data types: clinical indicators, genomic data, lifestyle data More and more models for predicting chronic diseases are beginning to combine information from different sources. Clinical indicators, such as laboratory tests, medical images and electronic health records, are the basis of most models and can provide health information collected systematically and routinely (Tse et al., 2023). Genomic data, such as gene sequencing and multi-omics information, can help identify genetic risks and molecular mechanisms associated with diseases (Snyder and Zhou, 2019). Lifestyle data, such as exercise, diet and environmental exposure, are now available through wearable devices, health apps and self-reports, enabling dynamic and long-term tracking of behaviors related to chronic diseases (Snyder and Zhou, 2019). Combining these different types of data can enhance the predictive ability of the model and support individualized risk assessment and intervention (Alonso et al., 2017). 2.3 Research progress and limitations of big data-driven health prediction at home and abroad Health prediction based on big data has made significant progress worldwide. At present, large-scale information such as electronic medical records, medical images and multi-omics data is often used to establish more accurate models, which is conducive to the early detection of diseases, risk differentiation and personalized treatment (Snyder and Zhou, 2019; Nascimento et al., 2021). In many application scenarios, artificial intelligence and machine learning methods perform better than traditional statistical methods, with higher accuracy, specificity and scalability (Ngiam and Khor, 2019). These advancements have also driven the use of real-time risk estimation and prediction tools in clinical Settings (Tse et al., 2023). At present, there are still many difficult problems to be solved. Many studies have been restricted due to different data quality, incomplete information or non-standard records, and it is impossible to effectively combine data from different sources together. Methodological issues such as overfitting, lack of external validation, and limited applicability of the model to different populations are also very common (Riley et al., 2016; Nascimento et al., 2021). In addition, data privacy and security, as well as possible biases in the prediction results, have also attracted increasing attention (Prosperi et al., 2018; Ngiam and Khor, 2019). Only by addressing these challenges well can we better leverage the potential of big data in chronic disease prediction and promote fairer health outcomes. 3 Big data Processing and Feature Engineering 3.1 Data collection and integration: fusion of multi-source heterogeneous data In the context of big data, integrating data from multiple different sources is an important foundation for the prediction of chronic diseases. The data may come from electronic medical records, sensors, genetic information and patient self-reports, and the formats and structures of these data are all different. Combining them effectively can more comprehensively depict the health status of patients, discover complex relationships and improve

RkJQdWJsaXNoZXIy MjQ4ODYzNA==