Computation, Visualization, and Applications of Convex Clustering

dc.contributor.advisorAllen, Genevera I
dc.creatorNagorski, John
dc.date.accessioned2019-05-17T15:37:35Z
dc.date.available2019-05-17T15:37:35Z
dc.date.created2018-08
dc.date.issued2018-10-04
dc.date.submittedAugust 2018
dc.date.updated2019-05-17T15:37:35Z
dc.description.abstractClustering is a ubiquitous tool for exploratory data analysis across the sciences, with the general aim of identifying groups of similar objects. Recent work has recast the clustering problem within the framework of convex optimization, addressing many shortcomings of traditional methods such as interpretability, stability, and parameter selection. The method of Convex Clustering has proven to be a canonical example of such an approach, and its extensions and applications will be the focus of this work. We begin by considering the application of Convex Clustering in the novel setting of region detection for high-throughput genomic data. We illustrate the versatility of Convex Clustering by developing a novel extension, Spatial Convex Clustering (SpaCC), specifically catered to multivariate spatially correlated genomics data. We demonstrate SpaCC to achieve state-of-the-art performance on the well-studied prob- lem of Copy Number Segmentation, and show it to be similarly successful in the novel setting of DNA Methylation region detection. Next, we address several shortcomings of Convex Clustering including slow computation and lack of familiar visualizations relative to its traditional counterparts. To do so, we introduce algorithms for the fast approximation of the Convex Clustering solution path and provide both theoretical guarantees of error control as well as empirical investigations. Next, we provide a suite of visualization techniques to aid in the interpretation of the clustering solutioniii path, exploring their insights via several real data examples. Finally we introduce the R-package, clustRviz, which gives practitioners direct access to the fast computation and dynamic visualizations introduced throughout.
dc.format.mimetypeapplication/pdf
dc.identifier.citationNagorski, John. "Computation, Visualization, and Applications of Convex Clustering." (2018) Diss., Rice University. <a href="https://hdl.handle.net/1911/105793">https://hdl.handle.net/1911/105793</a>.
dc.identifier.urihttps://hdl.handle.net/1911/105793
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectClustering
dc.subjectConvex Optimization
dc.subjectData Visualization
dc.subjectHigh-Throughput Genomics
dc.titleComputation, Visualization, and Applications of Convex Clustering
dc.typeThesis
dc.type.materialText
thesis.degree.departmentStatistics
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
NAGORSKI-DOCUMENT-2018.pdf
Size:
16.33 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: