• Login
    View Item 
    •   DSpace Home
    • Stony Brook University
    • Stony Brook Theses & Dissertations [SBU]
    • View Item
    •   DSpace Home
    • Stony Brook University
    • Stony Brook Theses & Dissertations [SBU]
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDepartmentThis CollectionBy Issue DateAuthorsTitlesSubjectsDepartment

    My Account

    LoginRegister

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    New Development in Cluster Analysis and Other Related Multivariate Analysis Methods

    Thumbnail
    View/Open
    Zhang_grad.sunysb_0771E_10779.pdf (3.061Mb)
    Date
    1-May-12
    Author
    Zhang, Shaonan
    Publisher
    The Graduate School, Stony Brook University: Stony Brook, NY.
    Metadata
    Show full item record
    Abstract
    Cluster analysis is a multivariate analysis method aimed at (1) unraveling the natural groupings embedded within the data, and (2) dimension reduction. With the wide application of cluster analysis in the diversified modern research/business fields including machine learning, bioinformatics, medical image analysis, pattern recognition, market research and global climate research, many clustering algorithms have been developed to date. However, novel and/or special circumstances always call for better customized cluster analysis methods, and thus this thesis. This thesis work consists of two parts. In the first part, we extend the modern multiple-objective cluster analysis from using a single set of features to multiple distinct sets of features by developing the novel compound clustering method and the constrained clustering method. We also developed a new statistic, the "complete linkage" R <super>2</super> along with the well-known largest average silhouette, to determine the optimal number of clusters in the compound clustering. The novel compound/constrained clustering methods are illustrated through a gene microarray study with both gene expression data and gene function information. In the second part of this thesis we propose a novel algorithm for the weighted k-means clustering. Weighted k-means clustering is an extension of the k-means clustering in which a set of nonnegative weights are assigned to all the variables. We first derived the optimal variable weights for weighted k-means clustering in order to obtain more meaningful and interpretable clusters. We then improved the current weighted k-means clustering method (Huh and Lim 2009) by incorporating our novel algorithm to obtain global-optimal guaranteed variable weights based on the method of Lagrange multiplier and the Karush-Kuhn-Tucker conditions. Here we first present the related theoretical formulation and derivation of the optimal weights. Then we provide an iteration-based computing algorithm to calculate such optimal weights. Numerical examples on both simulated and well known real data are provided to illustrate our method. It is shown that our method outperforms the original proposed method in terms of classification accuracy, stability and computation efficiency.
    Description
    121 pg.
    URI
    http://hdl.handle.net/1951/60237
    Collections
    • Stony Brook Theses & Dissertations [SBU] [1955]

    SUNY Digital Repository Support
    DSpace software copyright © 2002-2023  DuraSpace
    Contact Us | Send Feedback
    DSpace Express is a service operated by 
    Atmire NV
     

     


    SUNY Digital Repository Support
    DSpace software copyright © 2002-2023  DuraSpace
    Contact Us | Send Feedback
    DSpace Express is a service operated by 
    Atmire NV