Abstract: The Analytic Dangers of Gaussian Finite Mixture Models: An Empirical Case Study (Society for Prevention Research 26th Annual Meeting)

Schedule:

Wednesday, May 30, 2018

Columbia A/B (Hyatt Regency Washington, Washington, DC)

* noted as presenting author

Albert J. Burgess-Hull, MS, Graduate Student, University of Wisconsin-Madison, Madison, WI

Many theories of substance use posit the existence of qualitatively distinct groups of individuals who share similar etiology, temporal course, and/or distal outcomes. Identifying and describing these groups within the population has important implications for prevention research because these groups may highlight individuals at risk for maladaptive outcomes (e.g., dependence or relapse), and can inform the development of individualized/targeted prevention programs.

In recent years, the use of finite mixture modeling (FMM) to empirically identify homogenous subgroups within a population has increased in popularity. FMM posits the existence of latent subgroups within the population that can be modeled via a finite number of probability distributions. However, while the use of FMM has significantly advanced substance use researchers understanding of substance use development, a number of methodologists have highlighted the analytic dangers of fitting normal (Gaussian) mixture models to non-normal data. The primary purpose of this study is to provide an empirical illustration of the substantive and analytic results of fitting a normal and non-normal (Student t) FMM to data commonly utilized by prevention scientists, and the impact on subsequent statistical inferences.

The current study extends previous analyses which evaluated whether distinct subgroups defined by characteristics of an individual’s social network could be identified within a sample of smokers initiating a quit attempt. Utilizing baseline data drawn from a large smoking cessation trial (N = 1504, 53% female, 84% Caucasian), FMMs with mixtures of normal (FMM-n) and Student t-distributions (FMM-t) were fit to nine social network variables collected from a social network interview.

Results revealed that six subgroups provided the best fit for the FMM-n, while five subgroups provided the best fit for the FMM-t. Model fit indices indicated that the FMM-t provided a better overall fit to the data. Substantive examination of the final two models (FMM-n vs. FMM-t) revealed that the subgroups identified by the two models were largely different. Examination of subgroup assignment differences across the two models also revealed that participant assignment to subgroups was affected by the use of different mixture distributions.

These findings highlight the strong influence that different distributional types can have on the substantive and analytic results of FMM. As data utilized by prevention scientists is typically non-normal, findings from this study draw attention to the hazards of indiscriminately utilizing normal FMMs in applied problems and suggests that prevention scientists should incorporate non-normal FMMs into their statistical toolbox.

Abstract: The Analytic Dangers of Gaussian Finite Mixture Models: An Empirical Case Study (Society for Prevention Research 26th Annual Meeting)

197 The Analytic Dangers of Gaussian Finite Mixture Models: An Empirical Case Study