Query Processing and Optimization for Database Stochastic Analytics

dc.contributor.advisorJermaine, Christopher Men_US
dc.contributor.committeeMemberNg, T.S. Eugeneen_US
dc.contributor.committeeMemberVarman, Peter Jen_US
dc.creatorPerez, Luis Leopoldoen_US
dc.date.accessioned2016-02-05T22:16:50Zen_US
dc.date.available2016-02-05T22:16:50Zen_US
dc.date.created2014-12en_US
dc.date.issued2014-12-03en_US
dc.date.submittedDecember 2014en_US
dc.date.updated2016-02-05T22:16:50Zen_US
dc.description.abstractThe application of relational database systems to analytical processing has been an active area of research for about two decades, motivated by constant surges in the scale of the data and in the complexity of the analysis tasks. Simultaneously, stochastic techniques have become commonplace in large-scale data analytics. This work is concerned with the application of relational database systems to support stochastic analytical tasks, particularly with the query evaluation and optimization phases. In this work, three problems are addressed in the context of MCDB/SimSQL, a relational database system for uncertain data management and analytics. The first contribution is a set of efficient techniques for evaluating queries that require satisfying a probability threshold, such as "Which pending orders are estimated to be processed and shipped by the end of the month, with a probability of at least 95%?" where the processing and shipment times of each order are generated by an arbitrary stochastic process. Results show that these techniques make sensible use of resources, weeding out data elements that require relatively few samples during the early stages of query evaluation. The second problem is concerned with recycling the materialized intermediate results of a query to optimize other queries in the future. Taking the assumption that a history of past queries provides an accurate picture of the workload, I describe techniques for query optimization that evaluate the costs and benefits of materializing intermediate results, with the objective of minimizing the hypothetical costs of future queries, subject to constraints on disk space. Results show a substantial improvement over conventional query caching techniques in workload and average query execution time. Finally, this work addresses the problem of evaluating queries for stochastic generative models, specified in a high level notation that treats random variables as first-class objects and allows operations with structured objects such as vectors and matrices. I describe a notation that, relying on the syntax of comprehensions, provides a language for denoting generative models that guarantees correspondence with relational algebra expressions, and techniques for translating a model into a database schema and set of relational queries.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationPerez, Luis Leopoldo. "Query Processing and Optimization for Database Stochastic Analytics." (2014) Diss., Rice University. <a href="https://hdl.handle.net/1911/88437">https://hdl.handle.net/1911/88437</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/88437en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectComputingen_US
dc.subjectDatabasesen_US
dc.subjectOptimizationen_US
dc.subjectAnalyticsen_US
dc.titleQuery Processing and Optimization for Database Stochastic Analyticsen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
PEREZ-DOCUMENT-2014.pdf
Size:
1.92 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: