Computational and Statistical Methodology for Highly Structured Data

dc.contributor.advisorEnsor, Katherine Ben_US
dc.creatorWeylandt, Michaelen_US
dc.date.accessioned2020-09-23T14:23:00Zen_US
dc.date.available2021-06-01T05:01:11Zen_US
dc.date.created2020-12en_US
dc.date.issued2020-09-15en_US
dc.date.submittedDecember 2020en_US
dc.date.updated2020-09-23T14:23:01Zen_US
dc.description.abstractModern data-intensive research is typically characterized by large scale data and the impressive computational and modeling tools necessary to analyze it. Equally important, though less remarked upon, is the important structure present in large data sets. Statistical approaches that incorporate knowledge of this structure, whether spatio-temporal dependence or sparsity in a suitable basis, are essential to accurately capture the richness of modern large scale data sets. This thesis presents four novel methodologies for dealing with various types of highly structured data in a statistically rich and computationally efficient manner. The first project considers sparse regression and sparse covariance selection for complex valued data. While complex valued data is ubiquitous in spectral analysis and neuroimaging, typical machine learning techniques discard the rich structure of complex numbers, losing valuable phase information in the process. A major contribution of this project is the development of convex analysis for a class of non-smooth "Wirtinger" functions, which allows high-dimensional statistical theory to be applied in the complex domain. The second project considers clustering of large scale multi-way array ("tensor") data. Efficient clustering algorithms for convex bi-clustering and co-clustering are derived and shown to achieve an order-of-magnitude speed improvement over previous approaches. The third project considers principal component analysis for data with smooth and/or sparse structure. An efficient manifold optimization technique is proposed which can flexibly adapt to a wide variety of regularization schemes, while efficiently estimating multiple principal components. Despite the non-convexity of the manifold constraints used, it is possible to establish convergence to a stationary point. Additionally, a new family of "deflation" schemes are proposed to allow iterative estimation of nested principal components while maintaining weaker forms of orthogonality. The fourth and final project develops a multivariate volatility model for US natural gas markets. This model flexibly incorporates differing market dynamics across time scales and different spatial locations. A rigorous evaluation shows significantly improved forecasting performance both in- and out-of-sample. All four methodologies are able to flexibly incorporate prior knowledge in a statistically rigorous fashion while maintaining a high degree of computational performance.en_US
dc.embargo.terms2021-06-01en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationWeylandt, Michael. "Computational and Statistical Methodology for Highly Structured Data." (2020) Diss., Rice University. <a href="https://hdl.handle.net/1911/109374">https://hdl.handle.net/1911/109374</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/109374en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectStatisticsen_US
dc.subjectMachine Learningen_US
dc.subjectOptimizationen_US
dc.subjectRegularizationen_US
dc.subjectStructured Dataen_US
dc.titleComputational and Statistical Methodology for Highly Structured Dataen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentStatisticsen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
WEYLANDT-DOCUMENT-2020.pdf
Size:
7.08 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.85 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: