Abstract: Using Machine Learning Methods to Examine Healthcare Utilization Patterns of Elderly and Middle-Aged Adults in the United States (Society for Prevention Research 24th Annual Meeting)

609 Using Machine Learning Methods to Examine Healthcare Utilization Patterns of Elderly and Middle-Aged Adults in the United States

Friday, June 3, 2016
Regency B (Hyatt Regency San Francisco)
* noted as presenting author
Cilia Zayas, MHA, Graduate Research Assistant, University of Florida, Gainesville, FL
Mildred Maldonado-Molina, PhD, Associate Professor, University of Florida, Gainesville, FL
Jian Bian, PhD, Assistant Professor, University of Florida, Gainesville, FL
Introduction: Elderly patients, aged 65 or older, make up 13.5% of the U.S. population, but they represent 45.2% of the top 10% of healthcare utilizers. Middle-aged Americans from 45 to 64 make up the other 37.0% of that category. Given the high demand of healthcare services by this population, it is important to identify high-users of healthcare systems and potential ineffective utilization patterns to highlight where targeted care interventions could be placed to improve care delivery. 

We present a machine learning (ML) method utilizing the Medical Expenditures Panel Survey (MEPS) dataset to cluster patients into groups with similar utilization profiles characterized by numbers of utilizations of different healthcare services. By doing so, we can identify dominant utilization patterns, and assess high utilizers’ general characteristics.

Methods: Our research sample included 12,652 elderly and middle-aged adult respondents in the 2013 MEPS.  The MEPS is a nationally representative survey used to collect comprehensive data on healthcare utilization and expenditures in the United States.  We first used a Random Forest (RF) regression model to predict expenditures based on patients’ healthcare utilization profiles. Utilization profiles are characterized by the number of office-based, outpatient, emergency room, and inpatient visits, the number of home care days, and the number of prescription medications. As part of the construction, RF models naturally lead to a similarity measure between samples. Thus, we applied Hierarchical Agglomerative Clustering (HAC) leveraging these similarity measures to identify clusters of utilization profiles.

Results: Following ML best practice, the learned RF regression model exhibits good performance for both sub-populations, i.e., elderly (r2=0.532, nrmse=1.24) and middle-aged (r2=0.411, nrmse=2.07) adults, and the combined population (r2=0.478, nrmse=1.66).  As expected, the defined healthcare utilization profile is a stronger predictor of expenditures for the elderly than the middle-aged population. The derived RF variable importance measures indicate the number of inpatient visits, physician visits, and prescription medications (in that order) are strong predictors. When examining individual variable’s prediction performance (i.e., using only that variable to train a RF model), the number of emergency room visits is strongly correlated with the expenditures for the elderly population. Further, the derived clusters (k=10) using HAC based on the RF similarity measures also provide meaningful segmentation.  

Conclusions: We present a novel method, leveraging RF regression and HAC, for healthcare utilization analysis with promising results. The learned clusters can be used to understand utilization patterns of high utilizers towards a learning health system leading to better health policy making and practice.