## Power analysis of the likelihood ratio test for logistic regression mixtures

##### Abstract

Finite mixture models emerge in many applications, particularly in biology, psychology and genetics. This dissertation focused on detecting associations between a quantitative explanatory variable and a dichotomous response variable in a situation where the population consists of a mixture. That is, there is a fraction of the population for whom there is an association between the quantitative predictor and the response and there is a fraction of individuals for whom there is no association between the quantitative predictor and the response. We developed the Likelihood Ratio Test (LRT) in the context of ordinary logistic regression models and logistic regression mixture models. However, the classical theorem for the null distribution of the LRT statistics can not be applied to finite mixture alternatives. Thus, we conjectured that the asymptotic null distribution of the LRT statistics held. We investigated how the empirical and fitted null distribution of the LRT statistics compared with our conjecture. We found that the null distribution appears to be well approximated by a 50:50 mixture of chi-squared distributions with respect to the critical values. Based on this null distribution, simulation studies were conducted to compare the power of the ordinary logistic regression models to the logistic regression mixture models. The logistic regression mixture models resulted in the improvement in power to detect the association between the two variables, compared with the ordinary logistic regression models. We found the significant factors in the improvement of the power by modeling the odds ratio in the improvement (logistic mixture model vs. ordinary logistic regression model). Essentially, the only factors that affected improvement in power were slope and mixing proportion. In addition, we compared the precision of these two approaches. This mixture model can be widely applied in large sample surveys with non-response and in missing data problems.