Adjusting for population stratification in longitudinal quantitative trait locus identification

Thumbnail Image
Issue Date
Wang, Yifan
The Graduate School, Stony Brook University: Stony Brook, NY.
Genome-wide association studies (GWAS) are widely used to detect genotypes associated with complex diseases. Such GWAS studies of disease progression over time may be clinically significant. Longitudinal quantitative trait locus (LQTL) methods are used in these studies to simulate disease progression. However, population stratification (PS) can lead to false positive or negative findings when conducting a GWAS study. PS is induced by a candidate marker's variation in allele frequency across ancestral populations. One of the approaches used to adjust for population stratification in GWAS is the global principal component analysis (PCA) approach. In this thesis I examine the statistical properties of GWAS analysis procedures using principal component adjustments across the whole genome. I use additive risk allele models to test the association between rare genetic variants and the longitudinal quantitative phenotypes across the whole genome. The genotype data are taken from the Hapmap 3 dataset for 1198 unrelated individuals. The simulated quantitative phenotype data are estimated using the Bayesian posterior probabilities (BPPs) that a participant belongs to a clinically important trajectory curve. The PCA method implemented in the EIGENSTRAT program is then used to reduce the data to ten variables containing most of the genetic variability information. The power and rejection rates are evaluated based on 1000 simulated replicates. The association test follows a chi-square distribution with one degree of freedom under the null hypothesis of no association. The p-values of the test of the coefficient of a genotype with and without a PC adjustment for PS are documented. For each disease gene, I select 25 matching SNPs (the ones with high correlation coefficient of allele frequencies with the disease gene across population) and 25 non-correlated SNPs (the ones with low correlation coefficient of allele frequencies with the disease gene across population). All SNPs considered are in overall Hardy Weinberg equilibrium (HWE). The additive risk allele model LQTL models have strong empirical power. The model with global PCA adjustment for PS is able to consistently maintain correct false positive rates.
164 pg.