Testing the properties of selection criteria: an application to copy number polymorphism measurements

Loading...
Thumbnail Image

Authors

Saint Fleur, Rose Edy

Issue Date

1-Aug-10

Type

Dissertation

Language

en_US

Keywords

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Variation in the human genome is present in many forms, including single-nucleotide polymorphisms (SNPs) and copy number polymorphisms (CNPs). CNPs have many categories such as small insertion-deletion polymorphisms, variable number of repetitive sequences, and genomic structural alterations. A major question that researchers in the field of statistical genetics need to answer is the number of CNP categories in a given dataset. In this study, I compare five information criteria (BIC, AIC, NEC, CLC, and ICL-BIC) to find if there is a"best" measure among them in finding the correct number of components (correct number of CNP categories). I consider six design factors: equal/unequal within-component variances, high/low separations, sample size, mixture proportion, multiple random starting values, and transformation using two known number of components (3 and 6). The result indicates that under"ideal" conditions (that is, small number of components, large separation between components, constant within component variance, and no subsequent transformation of mixture data), each criterion performs well. When the data is a monotonic transformation of data from a mixture, the BIC criterion, which is the most commonly used criterion in CNP research, has a low component number accuracy rate. I then considered the application of the Box-Cox transformation whether or not it was needed. The application of the Box-Cox transformation did not reduce the component number accuracy rate of the CLC, ICL-BIC, and BIC when it was not needed. The component number accuracy rates for the BIC criterion with Box-Cox transformation applied were improved when the mixture data was transformed. The Box-Cox transformation should be used routinely with CLC, ICL-BIC, or BIC criterion to estimate the number of components in a CNP mixture analysis.

Description

Citation

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

License

Journal

Volume

Issue

PubMed ID

DOI

ISSN

EISSN