Application of Double Sampling to Combine Measured and Imputed Genotype Data in Genetic Association Studies
MetadataShow full item record
Genotype imputation provides an essential technique for genome-wide association studies (GWAS) with hundreds of thousands of SNPs. Understanding the connection between imputation inconsistencies and the power to detect association at imputed markers or the disease genes close to them is important for the optimal design of imputation-based GWAS since genotype misclassification can significantly decrease statistical power to detect association. Double sampling of genotypes is a statistical procedure in which a portion of subjects receive a second and more precise genotyping. This paper applies the likelihood ratio test allowing for errors (LRT-AE), which incorporates double sample information for genotypes on a sub-sample of cases/controls, to correct for imputation inconsistencies. Parameters used to determine the log likelihoods are determined using the Expectation-Maximization (EM) algorithm. To compare the performance of the LRT-AE with the performance of the likelihood ratio test (LRT), which makes no adjustment for imputation inconsistencies, I perform simulation studies using a factorial design with high and low settings of: disease minor allele frequency (MAF), heterozygote relative risk, mode of inheritance (MOI), disease prevalence, and proportion of double sampled subjects. The LRT-AE method maintains correct type I error rates for all null simulations and all significance level thresholds (5%, 1%). Power improvement, however, is not significant unless more than 50% of subjects are in the double sampled group. Unbiased estimates of imputation inconsistency rates are also obtained from the LRT-AE method.