Applying Machine Learning to Query Optimization

dc.contributor.advisorJermaine, Chris
dc.creatorSikdar, Sourav
dc.date.accessioned2022-11-10T16:58:47Z
dc.date.available2022-11-10T16:58:47Z
dc.date.created2020-12
dc.date.issued2021-06-15
dc.date.submittedDecember 2020
dc.date.updated2022-11-10T16:58:47Z
dc.description.abstractRecent progress in Machine learning (ML) and Artificial Intelligence (AI) has the potential to impact the design and implementation of many aspects of modern database systems. ML and AI may have a significant impact on the design of the query optimizer, which a database uses to explore the large space of semantically equivalent plans for implementing a given query, with the goal of choosing the plan with the least cost. This thesis seeks to use ML and AI to improve the state of the art in multiple areas of database query optimization. In the first part of the thesis, I consider the problem of optimization of queries with user-defined functions (UDFs). Most modern SQL database systems and Big Data processing systems support UDFs, which make optimization difficult. The backbone of database query optimization is the collection of statistics describing the data to be processed, but when a database or Big Data computation is obscured by UDFs, good statistics are often unavailable. I propose a solution called "Multi-Step Optimization and Execution" or Monsoon. Monsoon models execution and statistics collection as a Markov decision process (MDP) and allows multiple, interleaved execution of each. Monsoon may choose to collect statistics on the UDFs, and then run a computation; or it may optimize and execute part of the plan, collecting statistics on the result of the partial plan, followed by a re-optimization step, with the process repeated as needed. Monsoon uses Monte-Carlo tree search (MCTS) (a common MDP solver) to find the best execution plan for a given query. In an experimental study, I demonstrate that Monsoon can match or outperform most alternative solutions for optimizing queries with UDFs. In the second part of the thesis, I address the problem of reducing cardinality estimation errors, stemming from inaccuracies in analytical cost models. This is a problem that has long plagued query optimizers. Traditionally, query optimizers employ static cost models that do not support any mechanism to incorporate feedback regarding the quality of the resulting plans. To alleviate this problem, neural cost models have been proposed in recent literature that can learn from their mistakes. However, these neural solutions need large numbers of example queries that have already been executed over a given database to learn from and cannot work well ``out of the box''. In this thesis, I consider the creation of a neural cost model to be an instance of few-shot learning, where the goal is to work well with just a few training examples. Unlike other domains where little is known about the semantics of the problem, one of the key aspects of the problem of learning for query optimization that makes it amenable to few-shot learning is the ability of high-quality, analytic cost models that are already known to work in many cases. The idea I explore is to build a recurrent neural network designed to mimic the classical cost model, so it performs as well as the classical model out of the box, without any training. However, since it is a neural network, it can learn. Subsequently, after the model is deployed and data are observed, the model is fine-tuned on the given database and installation. Because it is already of high quality before training, it is able to adapt to the new setting using very few training queries. In an empirical study, I demonstrate that this approach outperforms both classical and modern neural cost models.
dc.format.mimetypeapplication/pdf
dc.identifier.citationSikdar, Sourav. "Applying Machine Learning to Query Optimization." (2021) Diss., Rice University. <a href="https://hdl.handle.net/1911/113894">https://hdl.handle.net/1911/113894</a>.
dc.identifier.urihttps://hdl.handle.net/1911/113894
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectQuery Optimization
dc.subjectMachine Learning
dc.titleApplying Machine Learning to Query Optimization
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SIKDAR-DOCUMENT-2020.pdf
Size:
2.45 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: