Toward Data-centric Automated Machine Learning

dc.contributor.advisorHu, Xia
dc.creatorLai, Henry
dc.date.accessioned2023-08-09T16:46:17Z
dc.date.created2023-05
dc.date.issued2023-04-14
dc.date.submittedMay 2023
dc.date.updated2023-08-09T16:46:18Z
dc.description.abstractMachine learning has become increasingly popular and has shown significant success in many fields. There are four main processes involved in developing a machine learning solution: data preparation, model selection, hyper-parameter tuning, and deployment for feedback collection. While automated machine learning (AutoML) has been proposed to streamline the middle two processes and deliver efficient solutions without requiring laborious trial-and-error efforts, the framework requires a well-prepared dataset and a perfectly defined setting, which may limit its capability toward more challenging real-world applications. Recent studies suggest that data preparation is often the key to optimal solutions in many challenging real-world applications. To bridge the gap between model selection and data preparation, we propose a complimentary AutoML framework that focuses on data-centric operations, which perform automated data preparations in different stages of a machine learning pipeline. Our framework includes a data-centric model customization framework to generate sample-specific learning strategies based on the attributes of individual data samples, a data-centric knowledge acquisition framework to effectively collect expert knowledge based on data distribution while considering its long-term effects on the model training procedure, and a model-aware data preparation framework that takes data distribution and attributes into consideration to further improve the datasets for challenging problem settings. Our goal is to develop an end-to-end data-centric AutoML system for real-world applications. To achieve this, we propose developing an end-to-end AutoML system for anomaly detection on time series data as a prototype to promote the proposed framework. With all these efforts, our research could further expand the capability of AutoML toward real-world applications.
dc.embargo.lift2023-11-01
dc.embargo.terms2023-11-01
dc.format.mimetypeapplication/pdf
dc.identifier.citationLai, Henry. "Toward Data-centric Automated Machine Learning." (2023) Diss., Rice University. <a href="https://hdl.handle.net/1911/115116">https://hdl.handle.net/1911/115116</a>.
dc.identifier.urihttps://hdl.handle.net/1911/115116
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectMachine Learning
dc.subjectReinforcement Learning
dc.subjectData Mining
dc.subjectAnomaly Detection
dc.subjectGraph Neural networks
dc.titleToward Data-centric Automated Machine Learning
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LAI-DOCUMENT-2023.pdf
Size:
9.99 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: