Show simple item record

dc.contributor.advisorZhu, Wei , Ahn, Hongshiken_US
dc.contributor.authorChen, Hongyanen_US
dc.contributor.otherDepartment of Applied Mathematics and Statisticsen_US
dc.date.accessioned2013-05-22T17:34:18Z
dc.date.available2013-05-22T17:34:18Z
dc.date.issued1-Dec-11en_US
dc.date.submitted11-Decen_US
dc.identifierChen_grad.sunysb_0771E_10753en_US
dc.identifier.urihttp://hdl.handle.net/1951/59607
dc.description102 pg.en_US
dc.description.abstractThe goal of the genome-wide association studies (GWAS) is to investigate the relationships between disease phenotypes and genotypes, which are usually determined by a large number of single nucleotide polymorphisms (SNPs). Currently GWAS are often underpowered to identify SNPs with small to moderate effect sizes. In order to overcome this difficulty, two major approaches, (1) meta-analysis by increasing sample size and (2) SNP pre-selection by dimension reduction, are often adopted. Dimension reduction for SNP data has been arduous due to the categorical nature of SNP that renders most association measures such as the Pearson correlation or the Euclidean distance inappropriate. In this thesis, we propose a novel (partial) canonical correlation association measure for categorical data that can be implemented to major dimension reduction approaches including: cluster analysis (CA) and partial correlation network analysis (PCNA) towards the analysis of GWAS data. Its performance is examined and comparison is made to other existing association measures. Network analysis methods such as PCNA and the Bayesian network serve as not only dimension reduction approaches but also data driven pathway discovery tools. A key objective in modern genetic studies is to discover the regulatory causal relationships between genetic mutations measured by SNPs and the resulting functional changes often gauged by gene expression levels. With the former being categorical and the latter continuous numerical data, we now face the problem of mixed data types. Our novel partial canonical correlation measure developed for categorical data can be readily extended to PCNA with mixed variables. This new approach is illustrated by using a real data example from a study on inflammatory bowel diseases conducted at Stony brook University Medical Center and the Washington University at St. Louis. Comparison is also made to Bayesian network analysis for mixed data and guidelines provided on the pros and cons of each method.en_US
dc.description.sponsorshipStony Brook University Libraries. SBU Graduate School in Department of Applied Mathematics and Statistics. Charles Taber (Dean of Graduate School).en_US
dc.formatElectronic Resourceen_US
dc.language.isoen_USen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.en_US
dc.subject.lcshStatistics--Biostatisticsen_US
dc.subject.otherCanonical correlation, Clustering analysis, Network analysis, Pearson residuals, SNPen_US
dc.titleClustering and Network Analysis with Single Nucleotide Polymorphism (SNP)en_US
dc.typeDissertationen_US
dc.description.advisorAdvisor(s): Zhu, Wei ; Ahn, Hongshik. Committee Member(s): Wu, Song ; Li, Ellen.en_US
dc.mimetypeApplication/PDFen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record