Development of Machine Learning Algorithms for Identifying Patients With Limited Health Literacy

Dylan Koole*, Oscar Shen, Amanda Lans, Tom M. de Groot, J. J. Verlaan, J. H. Schwab

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Rationale: Limited health literacy (HL) leads to poor health outcomes, psychological stress, and misutilization of medical resources. Although interventions aimed at improving HL may be effective, identifying patients at risk of limited HL in the clinical workflow is challenging. With machine learning (ML) algorithms based on readily available data, healthcare professionals would be enabled to incorporate HL screening without the need for administering in-person HL screening tools. Aims and Objectives: Develop ML algorithms to identify patients at risk for limited HL in spine patients. Methods: Between December 2021 and February 2023, consecutive English-speaking patients over the age of 18 and new to an urban academic outpatient spine clinic were approached for participation in a cross-sectional survey study. HL was assessed using the Newest Vital Sign and the scores were divided into limited (0–3) and adequate (4–6) HL. Additional patient characteristics were extracted through a sociodemographic survey and electronic health records. Subsequently, feature selection was performed by random forest algorithms with recursive feature selection and five ML models (stochastic gradient boosting, random forest, Bayes point machine, elastic-net penalized logistic regression, support vector machine) were developed to predict limited HL. Results: Seven hundred and fifty-three patients were included for model development, of whom 259 (34.4%) had limited HL. Variables identified for predicting limited HL were age, Area Deprivation Index-national, Social Vulnerability Index, insurance category, Body Mass Index, race, college education, and employment status. The Elastic-Net Penalized Logistic Regression algorithm achieved the best performance with a c-statistic of 0.766, calibration slope/intercept of 1.044/−0.037, and Brier score of 0.179. Conclusion: Elastic-Net Penalized Logistic Regression had the best performance when compared with other ML algorithms with a c-statistic of 0.766, calibration slope/intercept of 1.044/−0.037, and a Brier score of 0.179. Over one-third of patients presenting to an outpatient spine center were found to have limited HL. While this algorithm is far from being used in clinical practice, ML algorithms offer a potential opportunity for identifying patients at risk for limited HL without administering in-person HL assessments. This could possibly enable screening and early intervention to mitigate the potential negative consequences of limited HL without taxing the existing clinical workflow.

Original languageEnglish
Article numbere14248
JournalJournal of Evaluation in Clinical Practice
Volume31
Issue number1
DOIs
Publication statusPublished - Feb 2025

Keywords

  • health literacy
  • machine learning
  • orthopaedic surgery
  • social determinants of health
  • spine

Fingerprint

Dive into the research topics of 'Development of Machine Learning Algorithms for Identifying Patients With Limited Health Literacy'. Together they form a unique fingerprint.

Cite this