Computation, Visualization, and Applications of Convex Clustering

dc.contributor.advisorAllen, Genevera Ien_US
dc.creatorNagorski, Johnen_US
dc.date.accessioned2019-05-17T15:37:35Zen_US
dc.date.available2019-05-17T15:37:35Zen_US
dc.date.created2018-08en_US
dc.date.issued2018-10-04en_US
dc.date.submittedAugust 2018en_US
dc.date.updated2019-05-17T15:37:35Zen_US
dc.description.abstractClustering is a ubiquitous tool for exploratory data analysis across the sciences, with the general aim of identifying groups of similar objects. Recent work has recast the clustering problem within the framework of convex optimization, addressing many shortcomings of traditional methods such as interpretability, stability, and parameter selection. The method of Convex Clustering has proven to be a canonical example of such an approach, and its extensions and applications will be the focus of this work. We begin by considering the application of Convex Clustering in the novel setting of region detection for high-throughput genomic data. We illustrate the versatility of Convex Clustering by developing a novel extension, Spatial Convex Clustering (SpaCC), specifically catered to multivariate spatially correlated genomics data. We demonstrate SpaCC to achieve state-of-the-art performance on the well-studied prob- lem of Copy Number Segmentation, and show it to be similarly successful in the novel setting of DNA Methylation region detection. Next, we address several shortcomings of Convex Clustering including slow computation and lack of familiar visualizations relative to its traditional counterparts. To do so, we introduce algorithms for the fast approximation of the Convex Clustering solution path and provide both theoretical guarantees of error control as well as empirical investigations. Next, we provide a suite of visualization techniques to aid in the interpretation of the clustering solutioniii path, exploring their insights via several real data examples. Finally we introduce the R-package, clustRviz, which gives practitioners direct access to the fast computation and dynamic visualizations introduced throughout.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationNagorski, John. "Computation, Visualization, and Applications of Convex Clustering." (2018) Diss., Rice University. <a href="https://hdl.handle.net/1911/105793">https://hdl.handle.net/1911/105793</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/105793en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectClusteringen_US
dc.subjectConvex Optimizationen_US
dc.subjectData Visualizationen_US
dc.subjectHigh-Throughput Genomicsen_US
dc.titleComputation, Visualization, and Applications of Convex Clusteringen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentStatisticsen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
NAGORSKI-DOCUMENT-2018.pdf
Size:
16.33 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: