Cluster Analysis for Big-K Data: Models and Algorithms based on K-indicators

Yang, Yuchen

Cluster Analysis for Big-K Data: Models and Algorithms based on K-indicators

dc.contributor.advisor	Zhang, Yin
dc.creator	Yang, Yuchen
dc.date.accessioned	2021-08-16T18:08:12Z
dc.date.available	2021-08-16T18:08:12Z
dc.date.created	2020-08
dc.date.issued	2021-02-02
dc.date.submitted	August 2020
dc.date.updated	2021-08-16T18:08:12Z
dc.description.abstract	Cluster analysis is a fundamental unsupervised machine learning strategy with wide-ranging applications. When clustering big data, existing methods of choices increasingly encounter performance bottlenecks that limit solution quality and efficiency. To address such emerging bottlenecks, we propose a new clustering model, called K-indicators, based on a ``subspace matching" viewpoint. This non-convex optimization model allows an effective semi-convexification scheme, leading to an essentially deterministic, two-layered alternating projection algorithm called KindAP that requires neither random initialization nor parameter-tuning, while maintaining a complexity linear in the number of data points. We establish global convergence for the inner iterations and an exact recovery result for data sets with tight clusters. Built on the basic K-indicators model, a more advanced model is constructed to perform simultaneous outlier detection and cluster analysis. Under the spectral clustering framework, extensive experimental results on both synthetic datasets and real datasets show that the proposed methods exhibit improved scalability in terms of both solution quality and time compared to K-means and other baseline methods. An open-source software package in Python has been developed and released online that implements the algorithms studied in this thesis.
dc.format.mimetype	application/pdf
dc.identifier.citation	Yang, Yuchen. "Cluster Analysis for Big-K Data: Models and Algorithms based on K-indicators." (2021) Diss., Rice University. <a href="https://hdl.handle.net/1911/111176">https://hdl.handle.net/1911/111176</a>.
dc.identifier.uri	https://hdl.handle.net/1911/111176
dc.language.iso	eng
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subject	cluster analysis
dc.subject	optimization
dc.subject	outlier detection
dc.title	Cluster Analysis for Big-K Data: Models and Algorithms based on K-indicators
dc.type	Thesis
dc.type.material	Text
thesis.degree.department	Computational and Applied Mathematics
thesis.degree.discipline	Engineering
thesis.degree.grantor	Rice University
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: YANG-DOCUMENT-2020.pdf
Size:: 9.5 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 5.84 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 2.6 KB
Format:: Plain Text
Description:

Download

Collections

Rice University Electronic Theses and Dissertations