Computational and Statistical Methodology for Highly Structured Data

dc.contributor.advisorEnsor, Katherine B
dc.creatorWeylandt, Michael
dc.date.accessioned2020-09-23T14:23:00Z
dc.date.available2021-06-01T05:01:11Z
dc.date.created2020-12
dc.date.issued2020-09-15
dc.date.submittedDecember 2020
dc.date.updated2020-09-23T14:23:01Z
dc.description.abstractModern data-intensive research is typically characterized by large scale data and the impressive computational and modeling tools necessary to analyze it. Equally important, though less remarked upon, is the important structure present in large data sets. Statistical approaches that incorporate knowledge of this structure, whether spatio-temporal dependence or sparsity in a suitable basis, are essential to accurately capture the richness of modern large scale data sets. This thesis presents four novel methodologies for dealing with various types of highly structured data in a statistically rich and computationally efficient manner. The first project considers sparse regression and sparse covariance selection for complex valued data. While complex valued data is ubiquitous in spectral analysis and neuroimaging, typical machine learning techniques discard the rich structure of complex numbers, losing valuable phase information in the process. A major contribution of this project is the development of convex analysis for a class of non-smooth "Wirtinger" functions, which allows high-dimensional statistical theory to be applied in the complex domain. The second project considers clustering of large scale multi-way array ("tensor") data. Efficient clustering algorithms for convex bi-clustering and co-clustering are derived and shown to achieve an order-of-magnitude speed improvement over previous approaches. The third project considers principal component analysis for data with smooth and/or sparse structure. An efficient manifold optimization technique is proposed which can flexibly adapt to a wide variety of regularization schemes, while efficiently estimating multiple principal components. Despite the non-convexity of the manifold constraints used, it is possible to establish convergence to a stationary point. Additionally, a new family of "deflation" schemes are proposed to allow iterative estimation of nested principal components while maintaining weaker forms of orthogonality. The fourth and final project develops a multivariate volatility model for US natural gas markets. This model flexibly incorporates differing market dynamics across time scales and different spatial locations. A rigorous evaluation shows significantly improved forecasting performance both in- and out-of-sample. All four methodologies are able to flexibly incorporate prior knowledge in a statistically rigorous fashion while maintaining a high degree of computational performance.
dc.embargo.terms2021-06-01
dc.format.mimetypeapplication/pdf
dc.identifier.citationWeylandt, Michael. "Computational and Statistical Methodology for Highly Structured Data." (2020) Diss., Rice University. <a href="https://hdl.handle.net/1911/109374">https://hdl.handle.net/1911/109374</a>.
dc.identifier.urihttps://hdl.handle.net/1911/109374
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectStatistics
dc.subjectMachine Learning
dc.subjectOptimization
dc.subjectRegularization
dc.subjectStructured Data
dc.titleComputational and Statistical Methodology for Highly Structured Data
dc.typeThesis
dc.type.materialText
thesis.degree.departmentStatistics
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
WEYLANDT-DOCUMENT-2020.pdf
Size:
7.08 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.85 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: