• Login
    View Item 
    •   DSpace Home
    • Stony Brook University
    • Stony Brook Theses & Dissertations [SBU]
    • View Item
    •   DSpace Home
    • Stony Brook University
    • Stony Brook Theses & Dissertations [SBU]
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDepartmentThis CollectionBy Issue DateAuthorsTitlesSubjectsDepartment

    My Account

    LoginRegister

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Statistical Models for SNP Detection

    Thumbnail
    View/Open
    Cai_grad.sunysb_0771E_10371.pdf (1.777Mb)
    Date
    1-Dec-10
    Author
    Cai, Shengnan
    Publisher
    The Graduate School, Stony Brook University: Stony Brook, NY.
    Metadata
    Show full item record
    Abstract
    Variations in DNA sequences of humans have a strong association with many diseases. Single Nucleotide Polymorphism (SNP) is the most common type of DNA variations. Our research is to detect SNPs from the data generated by Polymerase Chain Reaction (PCR) and next generation sequencing methods. In the first part of the study, we had a relatively small data set with fewer known SNPs as the training data. We developed a classification model based on the cross validation method. From the first part of the research, we gained knowledge of the properties of the data. In the next phase, we obtained a much larger data set with a much larger group of known SNPs. We developed eight measures for every genetic position with these data. Using these eight measures as the predictor variables, we applied several classification methods such as Random Forest (RF), Support Vector Machines (SVM), Single Decision Tree (ST) and Logistic Regression (LR); then used cross validation to evaluate these classification methods. By comparing the predictive accuracy, sensitivity and specificity, we found the best performing model for the data. To compare the performances of these models while the number of observations for each genetic position (cover depth) is small, we randomly drew out subsets from the whole data and applied these classification models. Variable selection is also used to our study. The result shows, SVM using the selected variables has a significant higher average accuracy than the other methods in general, but RF using the selected variables performs the best when the cover depth is as small as 20.
    URI
    http://hdl.handle.net/1951/55377
    Collections
    • Stony Brook Theses & Dissertations [SBU] [1955]

    SUNY Digital Repository Support
    DSpace software copyright © 2002-2023  DuraSpace
    Contact Us | Send Feedback
    DSpace Express is a service operated by 
    Atmire NV
     

     


    SUNY Digital Repository Support
    DSpace software copyright © 2002-2023  DuraSpace
    Contact Us | Send Feedback
    DSpace Express is a service operated by 
    Atmire NV