Hu, Xia2023-08-092023-052023-04-14May 2023Lai, Henry. "Toward Data-centric Automated Machine Learning." (2023) Diss., Rice University. <a href="https://hdl.handle.net/1911/115116">https://hdl.handle.net/1911/115116</a>.https://hdl.handle.net/1911/115116Machine learning has become increasingly popular and has shown significant success in many fields. There are four main processes involved in developing a machine learning solution: data preparation, model selection, hyper-parameter tuning, and deployment for feedback collection. While automated machine learning (AutoML) has been proposed to streamline the middle two processes and deliver efficient solutions without requiring laborious trial-and-error efforts, the framework requires a well-prepared dataset and a perfectly defined setting, which may limit its capability toward more challenging real-world applications. Recent studies suggest that data preparation is often the key to optimal solutions in many challenging real-world applications. To bridge the gap between model selection and data preparation, we propose a complimentary AutoML framework that focuses on data-centric operations, which perform automated data preparations in different stages of a machine learning pipeline. Our framework includes a data-centric model customization framework to generate sample-specific learning strategies based on the attributes of individual data samples, a data-centric knowledge acquisition framework to effectively collect expert knowledge based on data distribution while considering its long-term effects on the model training procedure, and a model-aware data preparation framework that takes data distribution and attributes into consideration to further improve the datasets for challenging problem settings. Our goal is to develop an end-to-end data-centric AutoML system for real-world applications. To achieve this, we propose developing an end-to-end AutoML system for anomaly detection on time series data as a prototype to promote the proposed framework. With all these efforts, our research could further expand the capability of AutoML toward real-world applications.application/pdfengCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.Machine LearningReinforcement LearningData MiningAnomaly DetectionGraph Neural networksToward Data-centric Automated Machine LearningThesis2023-08-09