Browsing by Author "Hu, Xia"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item Auto-GNN: Neural architecture search of graph neural networks(Frontiers Media S.A., 2022) Zhou, Kaixiong; Huang, Xiao; Song, Qingquan; Chen, Rui; Hu, Xia; DATA LabGraph neural networks (GNNs) have been widely used in various graph analysis tasks. As the graph characteristics vary significantly in real-world systems, given a specific scenario, the architecture parameters need to be tuned carefully to identify a suitable GNN. Neural architecture search (NAS) has shown its potential in discovering the effective architectures for the learning tasks in image and language modeling. However, the existing NAS algorithms cannot be applied efficiently to GNN search problem because of two facts. First, the large-step exploration in the traditional controller fails to learn the sensitive performance variations with slight architecture modifications in GNNs. Second, the search space is composed of heterogeneous GNNs, which prevents the direct adoption of parameter sharing among them to accelerate the search progress. To tackle the challenges, we propose an automated graph neural networks (AGNN) framework, which aims to find the optimal GNN architecture efficiently. Specifically, a reinforced conservative controller is designed to explore the architecture space with small steps. To accelerate the validation, a novel constrained parameter sharing strategy is presented to regularize the weight transferring among GNNs. It avoids training from scratch and saves the computation time. Experimental results on the benchmark datasets demonstrate that the architecture identified by AGNN achieves the best performance and search efficiency, comparing with existing human-invented models and the traditional search methods.Item Counterfactuals for Interpretable Machine Learning: Model Reasoning from “What” to “How”(2023-05-23) Yang, Fan; Hu, XiaWith the extensive usage of machine learning (ML) in real-world applications, how to effectively explain the behaviors of ML models is becoming increasingly significant. A bunch of interpretation techniques have then been proposed, aiming to facilitate end-users for a better understanding towards the model working mechanism. Existing techniques for interpretable machine learning mainly focus on the feature attribution methods, where highly contributed features are exported as evidence for model predictions. However, those obtained feature contribution scores are not discriminative in nature, which makes them limited in reasoning decisions and understanding "how". Counterfactual Explanation, serving as one of the emerging types of ML interpretations, has raised the attention from both researchers and practitioners in recent years. Counterfactual explanation is essentially a series of hypothetical data samples, which is categorized under the example-based reasoning methodology and explored under "what-if" circumstances. The overall interpretation goal of counterfactuals is to indicate how the model decision alters with input perturbations. With valid counterfactual explanations, end-users can know how to flip the model decisions to a preferred outcome, so as to get a better sense of the decision boundaries. In this thesis, I will cover my previous research efforts on counterfactual explanations, and outline the introduction from three different perspectives. Firstly, for counterfactual derivation, I designed a framework to generate counterfactuals specifically for raw data instances with the proposed Attribute-Informed Perturbation. By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively. Instead of directly modifying instances in the data space, I iteratively optimized the constructed attribute-informed latent space, where features are more robust and semantic. Secondly, for counterfactual explainer deployment, I proposed a Model-based Counterfactual Synthesizer framework for efficient interpretation. I analyzed the model-based counterfactual process, and constructed a base synthesizer by adopting the conditional generative adversarial net structure. To better approximate the counterfactual universe for those minor queries, I employed the umbrella sampling technique to conduct the synthesizer training. I also enhanced the synthesizer by incorporating the causal dependence among attributes, and further validated its correctness through the causality identification approach. Thirdly, for counterfactual delivery to stake-holders, I proposed a novel framework to generate differentially private counterfactuals, where noises are injected for protection while maintaining the explanation roles. I trained an autoencoder with the functional mechanism to construct noisy class prototypes, and then derived the counterfactual explanation from the latent prototypes based on the post-processing immunity of differential privacy. Beyond general stake-holders, I also specifically proposed two explanation delivery frameworks for end-users and model developers. The further research goals are to focus on the sequential counterfactual which is more actionable for end-users, and the global counterfactual which is more insightful for model developers. At the end of thesis, I will list several promising directions to explore in the future.Item Efficient Methods for Deep Reinforcement Learning: Algorithms and Applications(2023-03-14) Zha, Daochen; Hu, XiaDeep reinforcement learning (deep RL) has recently achieved remarkable success in various domains, from simulated games to real-world applications. However, deep RL agents are notoriously sample-inefficient; they often need to collect a large number of samples from the environment to achieve a reasonable performance. This sample efficiency issue becomes more pronounced in sparse reward environments, where the rewards are zeros in most of the states so that the deep RL agents can barely learn. Unfortunately, collecting samples can be extremely expensive in many real-world applications; we may only be able to collect a very limited number of samples for training. The sample efficiency issue significantly hinders the applications of deep RL in the real world. To bridge this gap, this thesis makes several contributions to efficient deep RL. First, we propose a learning-based experience replay algorithm to improve the sample efficiency with better sample reuse. Second, we present an episode-level exploration strategy for efficient exploration in spare environments. Third, we investigate a real-world application of embedding table sharding and design an efficient training algorithm based on an estimated environment. Finally, we devise a more general framework by leveraging pre-trained models to improve efficiency and apply it to embedding table sharding. Putting all these together, our research could help build more efficient deep RL systems and facilitate their real-world deployment.Item PME: pruning-based multi-size embedding for recommender systems(Frontiers Media S.A., 2023) Liu, Zirui; Song, Qingquan; Li, Li; Choi, Soo-Hyun; Chen, Rui; Hu, XiaEmbedding is widely used in recommendation models to learn feature representations. However, the traditional embedding technique that assigns a fixed size to all categorical features may be suboptimal due to the following reasons. In recommendation domain, the majority of categorical features' embeddings can be trained with less capacity without impacting model performance, thereby storing embeddings with equal length may incur unnecessary memory usage. Existing work that tries to allocate customized sizes for each feature usually either simply scales the embedding size with feature's popularity or formulates this size allocation problem as an architecture selection problem. Unfortunately, most of these methods either have large performance drop or incur significant extra time cost for searching proper embedding sizes. In this article, instead of formulating the size allocation problem as an architecture selection problem, we approach the problem from a pruning perspective and propose Pruning-based Multi-size Embedding (PME) framework. During the search phase, we prune the dimensions that have the least impact on model performance in the embedding to reduce its capacity. Then, we show that the customized size of each token can be obtained by transferring the capacity of its pruned embedding with significant less search cost. Experimental results validate that PME can efficiently find proper sizes and hence achieve strong performance while significantly reducing the number of parameters in the embedding layer.Item Randomized Algorithms for Mega-AI Models(2023-08-08) Xu, Zhaozhuo; Shrivastava, Anshumali; Baraniuk, Richard; Hu, XiaOver the past few years, we have witnessed remarkable accomplishments in machine learning (ML) models due to increases in their sizes. However, the growth in model size has outpaced upgrades to hardware and network bandwidth, resulting in difficulties in training these Mega-AI models within current system infrastructures. Additionally, the shift towards training ML models on user devices, in light of global data privacy protection trends, has constrained hardware resources, exacerbating the tension between effectiveness and efficiency. Moreover, there exists an accuracy-efficiency trade-off in current ML algorithms and systems, where reducing computation and memory usage results in accuracy losses during both training and inference. This thesis aims to demonstrate algorithmic advancements in improving this trade-off in training Mega-AI models. Rather than relying on big data, we propose a focus on good data and sparse models, which refer to models with many parameters but only activate a subset during training for efficiency. We also frame the pursuit of good data and activated parameters as an information retrieval problem and develop hashing algorithms and data structures to maintain training accuracy while improving efficiency. This thesis begins with work on data sparsity and presents a hash-based sampling algorithm for Mega-AI models that adaptively selects data samples during training. We also demonstrate how this approach improves the machine teaching algorithm with 425.12x speedups and 99.76\% energy savings on edge devices. We then discuss our recent success in model sparsity and present a provably efficient hashing algorithm that adaptively selects and updates a subset of parameters during training. We also introduce methods to bridge the accuracy decline of sparse Mega-AI models in the post-training process. Finally, we present DRAGONN, a system that utilizes hash algorithms to achieve near-optimal communication for sparse and distributed ML. To demonstrate the utility of these scalable and sustainable ML algorithms, we apply them to personalized education, seismic imaging, and bioinformatics. Specifically, we show how modifying the ML algorithm can reduce seismic processing time from 10 months to 10 minutes.Item Toward Data-centric Automated Machine Learning(2023-04-14) Lai, Henry; Hu, XiaMachine learning has become increasingly popular and has shown significant success in many fields. There are four main processes involved in developing a machine learning solution: data preparation, model selection, hyper-parameter tuning, and deployment for feedback collection. While automated machine learning (AutoML) has been proposed to streamline the middle two processes and deliver efficient solutions without requiring laborious trial-and-error efforts, the framework requires a well-prepared dataset and a perfectly defined setting, which may limit its capability toward more challenging real-world applications. Recent studies suggest that data preparation is often the key to optimal solutions in many challenging real-world applications. To bridge the gap between model selection and data preparation, we propose a complimentary AutoML framework that focuses on data-centric operations, which perform automated data preparations in different stages of a machine learning pipeline. Our framework includes a data-centric model customization framework to generate sample-specific learning strategies based on the attributes of individual data samples, a data-centric knowledge acquisition framework to effectively collect expert knowledge based on data distribution while considering its long-term effects on the model training procedure, and a model-aware data preparation framework that takes data distribution and attributes into consideration to further improve the datasets for challenging problem settings. Our goal is to develop an end-to-end data-centric AutoML system for real-world applications. To achieve this, we propose developing an end-to-end AutoML system for anomaly detection on time series data as a prototype to promote the proposed framework. With all these efforts, our research could further expand the capability of AutoML toward real-world applications.