Show simple item record

dc.contributor.authorAhn, Hyeong Junen_US
dc.contributor.otherDepartment of Applied Mathematics and Statisticsen_US
dc.date.accessioned2012-05-17T12:19:44Z
dc.date.available2012-05-17T12:19:44Z
dc.date.issued1-Aug-11en_US
dc.date.submittedAug-11en_US
dc.identifierAhn_grad.sunysb_0771E_10595.pdfen_US
dc.identifier.urihttp://hdl.handle.net/1951/55943
dc.description.abstractSingle-nucleotide polymorphisms (SNPs) are the most common type of genetic variation in human genome. Haplotypes which combine multiple SNPs into super-alleles have been widely used in modern genetic analysis, especially in human disease association studies. The Expectation Maximization (EM) algorithm is commonly used in haplotype phasing and frequency estimation, and Hardy-Weinberg (HW) equilibrium is a key assumption built into the EM algorithm. The accuracy of EM-based haplotype frequency estimation when the HW equilibrium assumption is violated has been explored by several studies. The general consensus is that the sampling error plays a more dominant role in haplotypes estimation than the estimation error due to HW deviation; the accuracy of haplotype frequency estimation tends to improve with increasing homozygosity in the sample. However, these studies mainly concentrated on the impact of SNP level HW deviation. A theoretical foundation for the impact of HW deviation at the haplotype level on haplotype frequency estimation has not been established. In this dissertation, we derived the theoretical relationship among three haplotype mean squared errors: between population and sample frequencies (MSEPS), between true sample and sample estimated frequencies (MSESE), and between population and sample estimated frequencies (MSEPE). The theoretical relationship between SNP level and haplotype level HW deviations was also established. Our simulations show that the violation of HW equilibrium at haplotype level could result in more severe haplotype estimation error than sampling error, and the accuracy of haplotype frequency estimation is not always improved with increasing homozygosity. To incorporate the possible haplotype level HW deviations into the haplotype frequency estimation process, we propose a Hardy-Weinberg Deviation-Expectation/Conditional Maximization (HWD-ECM) method which allows us to estimate HW deviation parameters and haplotype frequencies simultaneously. For two SNPs cases, the HWD-ECM algorithm consists of three iteration steps: 1). an expectation step estimating genotype frequencies allowing HW deviation parameters; 2). a conditional maximization step for HW deviation parameter estimation utilizing constraints of SNP level or haplotype level HW deviation parameters; and 3). a conditional maximization step for haplotype frequencies. Simulation results show that the HWD-ECM method performs significantly better than the EM-based approach in haplotype estimation when HWE assumption is violated. Algorithm for extension of HWD-ECM to multiple SNPs is also discussed.en_US
dc.description.sponsorshipStony Brook University Libraries. SBU Graduate School in Department of Applied Mathematics and Statistics. Lawrence Martin (Dean of Graduate School).en_US
dc.formatElectronic Resourceen_US
dc.language.isoen_USen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.en_US
dc.subject.lcshStatistics -- Biostatisticsen_US
dc.subject.otherExpectation/Conditional Maximization (ECM) algorithm, Expectation Maximization (EM) algorithm, haplotype frequency estimation, Hardy-Weinberg Deviation-Expectation/Conditional Maximization (HWD-ECM) algorithm, Hardy-Weinberg (HW) deviation, Single-nucleotide polymorphism (SNP)en_US
dc.titleHardy-Weinberg Deviation and EM-based Haplotype Frequency Estimationen_US
dc.typeDissertationen_US
dc.description.advisorAdvisor(s): John J. Chen. Committee Member(s): Nancy R. Mendell; Wei Zhu; Barbara Nemesure.en_US
dc.mimetypeApplication/PDFen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record