Using electronic medical record data, we developed a point-of-care machine learning based model that predicts whether an individual patient will drop from care.
Methods: We developed a scalable machine learning system using raw patient medical data to compute risk scores for dropping out of care. For all patients who received HIV care at the University of Chicago from 2008 to 2015, we built features based on a variety of EMR variables including insurance, appointment attendance, diagnoses, social history, medications, and laboratory tests. These features are complex time-based aggregations of the underlying EMR data resulting in a total of 1,295 predictive variables. Interventions are resource intensive and costly hence, we selected a model that tunes our predictive performance to match our capacity for intervention.
The system explores the performance of a broad range of machine learning methods and hyperparameters including decision trees, gradient boosted decision trees, and random forests. Models were compared with a random baseline and clinically relevant expert rules. Given the diverse nature of the patient population, we audited the inherent bias in our system.
Results: 721 patients received HIV care at our institution over the study period, with approximately 1,500 appointments per year. Of these, 10% of the appointments were out of care. A random forest model had the highest positive predictive value.
Our system is significantly more accurate than expert heuristics used today, correctly identifying 10-60% more visits with at-risk patients. The most important features in the model are the history of previous appointments, lab tests results (both viral load and CD4 counts), and diagnoses. Our machine learning model is also superior to the expert rules because it provides individual level prediction instead of coarse group level prediction, which is currently used. Further, in our bias audit, our model has significant less bias than using expert rules.
Conclusion: We built a predictive model for retention in care using machine learning methods that was significantly more accurate and less biased than currently used expert rules. To our knowledge, this is the first time a machine learning system has been applied to the problem of retention in care. This model can be implemented to guide retention interventions in real time.