Clustering and Network Analysis with Single Nucleotide Polymorphism (SNP)

Loading...
Thumbnail Image

Authors

Chen, Hongyan

Issue Date

1-Dec-11

Type

Dissertation

Language

en_US

Keywords

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

The goal of the genome-wide association studies (GWAS) is to investigate the relationships between disease phenotypes and genotypes, which are usually determined by a large number of single nucleotide polymorphisms (SNPs). Currently GWAS are often underpowered to identify SNPs with small to moderate effect sizes. In order to overcome this difficulty, two major approaches, (1) meta-analysis by increasing sample size and (2) SNP pre-selection by dimension reduction, are often adopted. Dimension reduction for SNP data has been arduous due to the categorical nature of SNP that renders most association measures such as the Pearson correlation or the Euclidean distance inappropriate. In this thesis, we propose a novel (partial) canonical correlation association measure for categorical data that can be implemented to major dimension reduction approaches including: cluster analysis (CA) and partial correlation network analysis (PCNA) towards the analysis of GWAS data. Its performance is examined and comparison is made to other existing association measures. Network analysis methods such as PCNA and the Bayesian network serve as not only dimension reduction approaches but also data driven pathway discovery tools. A key objective in modern genetic studies is to discover the regulatory causal relationships between genetic mutations measured by SNPs and the resulting functional changes often gauged by gene expression levels. With the former being categorical and the latter continuous numerical data, we now face the problem of mixed data types. Our novel partial canonical correlation measure developed for categorical data can be readily extended to PCNA with mixed variables. This new approach is illustrated by using a real data example from a study on inflammatory bowel diseases conducted at Stony brook University Medical Center and the Washington University at St. Louis. Comparison is also made to Bayesian network analysis for mixed data and guidelines provided on the pros and cons of each method.

Description

102 pg.

Citation

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

License

Journal

Volume

Issue

PubMed ID

DOI

ISSN

EISSN