Performance of Model Selection Statistics in Growth Mixture Modeling of Homogeneous Data

Thumbnail Image
Issue Date
Wang, Ruixue
The Graduate School, Stony Brook University: Stony Brook, NY.
Growth mixture modeling (GMM) is used to detect the existence of two or more trajectory patterns among participants in a longitudinal study. One crucial issue is the determination of the number of longitudinal trajectory patterns. I study the properties of three statistics used to identify the number of components in a sample of data. These are the Bayesian information criterion (BIC), Lo-Mendell-Rubin test (LMRT), and bootstrap likelihood ratio test (BLRT). I estimate the probability that each of these statistics identifies that there is a single component for homogeneous data using the M-plus and SAS PROC TRAJ statistical packages. I use four distributions for the longitudinal outcome measures: the censored normal distribution, the gamma distribution, the zero-inflated Poisson distribution and the Bernoulli distribution. I considered these factors: trajectory pattern, intra-class correlation, time measurements, random effects and sample size. For the censored normal distribution, the BIC and LMRT (set at the 0.01 significance level) have the highest fraction of replicates identified as homogeneous. These rates for LMRT are 0.92 or better at significance level 0.01 and 0.98 or better for the BIC. The identification rates of these two statistics are not significantly affected by the intra-class correlation in the trajectory, the trajectory pattern, the number of time measurements, and the sample size. A similar pattern was observed for the gamma distribution using the M-plus statistical package. The identification rate of the LMRT is better than that of the BLRT at both the 0.01 and 0.05 significance levels. For the ZIP and Bernoulli distribution, PROC TRAJ computations have a higher correct identification rate than those from M-plus. Larger sample size is associated with an increase in the probability that two or more components will be identified for ZIP distributed data following a linear trend and with random effects. The same pattern holds for Bernoulli data. Overall, the BIC statistic has the highest correct identification rate. These rates are on the order of 95% for homogeneous data following either a censored normal or gamma distribution.
75 pg.