Measures of fidelity to program curriculum can range from less than 10 to hundreds for a single session. Pragmatic approaches with fewer items typically lack sensitivity to detect variability. To maintain sensitivity, but create more feasible measures, the first study provides an example of the use of machine learning methods to analyze behavioral observation ratings of fidelity to the curriculum to identify the most parsimonious set of items with predictive validity.
Although the use of provider self-report has been controversial, it is both pragmatic and potentially useful for quality improvement. In the second study, concordance across raters (independent observer and provider self-report) and predictive validity for the quality of delivery is assessed at the item level to determine whether there are items on which providers can self-report.
Automated coding methods using machine learning approaches may be the end goal in making implementation monitoring truly feasible at a population level, if research can demonstrate their predictive validity. In the third study, the predictive validity of automated methods for assessing quality of delivery based on session transcripts is tested.
The discussant will provide a critique the findings of these studies and discuss the implications for measuring implementation in trials and community settings.