Browsing by Author "Yang, Yuchen"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Cluster Analysis for Big-K Data: Models and Algorithms based on K-indicators(2021-02-02) Yang, Yuchen; Zhang, YinCluster analysis is a fundamental unsupervised machine learning strategy with wide-ranging applications. When clustering big data, existing methods of choices increasingly encounter performance bottlenecks that limit solution quality and efficiency. To address such emerging bottlenecks, we propose a new clustering model, called K-indicators, based on a ``subspace matching" viewpoint. This non-convex optimization model allows an effective semi-convexification scheme, leading to an essentially deterministic, two-layered alternating projection algorithm called KindAP that requires neither random initialization nor parameter-tuning, while maintaining a complexity linear in the number of data points. We establish global convergence for the inner iterations and an exact recovery result for data sets with tight clusters. Built on the basic K-indicators model, a more advanced model is constructed to perform simultaneous outlier detection and cluster analysis. Under the spectral clustering framework, extensive experimental results on both synthetic datasets and real datasets show that the proposed methods exhibit improved scalability in terms of both solution quality and time compared to K-means and other baseline methods. An open-source software package in Python has been developed and released online that implements the algorithms studied in this thesis.Item Convergence of K-indicators Clustering with Alternating Projection Algorithms(2017-11-21) Yang, Yuchen; Zhang, Yin; Schaefer, Andrew J.; Hand, Paul EData clustering is a fundamental unsupervised machine learning problem, and the most widely used method of data clustering over the decades is k-means. Recently, a newly proposed algorithm called KindAP, based on the idea of subspace matching and a semi-convex relaxation scheme, outperforms k-means in many aspects, such as no random replication and insensitivity to initialization. Unlike k-means, empirical evidence suggests that KindAP can correctly identify well-separated globular clusters even when the number of clusters is large, but a rigorous theoretical analysis is necessary. This study improves the algorithm design and establishes the first-step theory for KindAP. KindAP is actually a two-layered alternating projection procedure applied to two non-convex sets. The inner loop solves an intermediate model via a semi-convex relaxation scheme that relaxes one more complicated non-convex set while keeping the other intact. We first derive a convergence result for this inner loop. Then under the “ideal data” assumption where n data points are exactly located at k positions, we prove that KindAP converges globally to the global minimum with the help of outer loop. Further work is ongoing to extend this analysis from the ideal data case to more general cases.