Associate Professor The Ohio State University COLUMBUS, Ohio, United States
Background: Screening for unmet health-related social needs (HRSNs) is a critical step towards addressing these nonmedical health risk factors. However, most providers lack the tools and resources to screen large patient populations for unmet HRSNs. Thus, we explored the application of Machine Learning (ML) techniques to identify patients with unmet HRSNs.
Objectives: Thus, our primary objective was to identify patients with unmet HRSNs from EHR through Machine Learning (ML) techniques.
Methods: We identified a sample of patients who had been screened for at least one HRSN from the OCHIN EHR data (2016-2022), the largest EHR data of ~6million patients from Community-based Healthcare Organizations. The date when patients were administered a HRSN questionnaire served as the index date; those who responded positive to questions about experiencing any of the following HRSNs were classified as positives: housing instability, inadequate housing quality, relationship safety, social isolation, transportation challenges, utility needs, and food insecurity. The following non-modifiable sociodemographic factors, measured on the index date, were included as potential predictors: age, sex, race, marital status, sexual orientation, preferred language, and migration/seasonal status. SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) techniques were used to explain the importance of each feature. AUROC was used as the primary measure of the performance of the prediction models.
Results: Of the 745,975 who were screened for at least one of the seven HRSNs 26.8% had at least one unmet HRSN. Food insecurity (23.8%) was the most prevalent unmet HRSN as compared to the rest - utility needs (19.7%), housing instability (18.4%), social isolation (17.3%), transportation barriers (16.2%), poor housing quality (7.9%), and relationship safety (6.5%). For the overall HRSN prediction model, the LightGBM algorithm (AUROC, 64.5%, 95%CI: 64.2, 64.7) performed slightly better than RF (AUROC, 63.4%, 95%CI: 63.0, 63.7) and XGBoost (AUROC, 60.1%, 95%CI: 59.8, 60.3). Similar performances of the models were observed when predicting individual HRSNs.
Conclusion: This study highlights the potential of using EHR data to develop ML-prediction models to screen for HRSNs at the point of care. Despite this promise, the predictive performances of these models must be improved before their adoption for use in real-world clinical settings.