Abstract: Regression Analysis of Coarsely Grouped Counts and Frequencies Using the Generalized Linear Model (Society for Prevention Research 21st Annual Meeting)

423 Regression Analysis of Coarsely Grouped Counts and Frequencies Using the Generalized Linear Model

Schedule:
Thursday, May 30, 2013
Pacific D-O (Hyatt Regency San Francisco)
* noted as presenting author
Stefany Coxe, PhD, Assistant Professor, Florida International University, Miami, FL
Leona S. Aiken, PhD, Professor, Arizona State University, Tempe, AZ
Stephen G. West, PhD, Professor, Arizona State University, Tempe, AZ
How many days per week do you exercise for 30 minutes or more?  Never?  Once or twice?  Every other day?  Most days?  Every day? Coarsely grouped counts such as this are commonly used in the behavioral sciences. When these variables are used as outcome variables, they often violate the assumptions of both linear regression and models designed for categorical outcomes; there is no model designed specifically for grouped count outcomes. Many analysis approaches also ignore the unequal spacing between categories; using the mean of the count range for each category captures the unequal spacing common in grouped count outcomes.  The purpose of this study was to compare the statistical performance of three common regression models (linear regression, Poisson regression, and ordinal logistic regression) that can be used when the outcome is a grouped count.  

METHOD

A simulation study was used to determine the power, type I error, and confidence interval (CI) coverage for these models. Mean structure, variance structure, effect size, predictor type, and sample size were included in the factorial design. Mean structure reflected either a linear or an exponential relationship between the predictor and the outcome. Since the distribution of the underlying count is unobserved, several variance options were evaluated, including homoscedastic, monotonically increasing, and increasing then decreasing variance. Zero, small, medium, and large effect sizes and sample sizes of 100, 200, 500, and 1000 were examined. A single predictor (either continuous or binary) was used to predict the grouped count outcome.

RESULTS

All regression models produced unbiased estimates of the regression coefficient. Ordinal logistic regression produced type I error, power, and confidence interval (CI) coverage rates that were consistently within acceptable limits. Linear regression produced type I error and power that were within acceptable limits, but CI coverage was too low in conditions with an exponential mean structure, particularly with a large effect size and/or monotonically increasing variance structure. Poisson regression displayed inflated type I error, low power, and low CI coverage rates for nearly all conditions.

CONCLUSIONS

Based on the statistical performance of the three models, ordinal logistic regression is the preferred method for analyzing grouped count outcomes. Linear regression also performed well, but CI coverage was too low for several conditions with an exponential mean structure; these specific conditions are of particular interest because they reflect conditions commonly observed for counts and frequencies. Comparisons of model fit and tests of model assumptions (e.g., the proportional odds assumption for ordinal logistic regression) are in progress.