Browsing by Author "Liu, Zirui"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Lossy Computation For Large-Scale Machine Learning(2024-08-05) Liu, Zirui; Hu, XiaIn recent years, machine learning (ML), particularly deep learning, has made significant strides in areas like image recognition and language processing. It's been shown that more parameters and data can greatly boost ML model performance. However, the growth in model and data size is outpacing hardware capabilities, leading to a gap between ML needs and hardware development. My research is aimed at creating scalable ML algorithms and systems to meet current and future ML demands, exploring methods like randomized and low-precision computations to handle larger data and model sizes without changing hardware. First, for dealing with large datasets, such as in analyzing molecular structures or social networks where data is interconnected, Graph neural networks (GNNs) have recently emerged as one of the de-facto standard tools to analyze the graph data. Leveraging the message passing mechanism, GNNs learn the representation of each node by iteratively aggregating information from its neighbors to capture of graph structures and relationships. However, the key challenges in graph representation learning is the scalability issue as the real-word graphs may contain more than billions of nodes, resulting in significant memory and speed inefficiency when training GNNs on huge graphs. To address the challenges of memory and time inefficiency in large-scale graph learning, we introduce two lossy computation paradigms. First, we propose a memory-efficient framework for training GNNs with significantly compressed activations. Second, we present a time-efficient GNN training method with degree-based graph sparsification. Second, regarding the challenge of handling large models, as the model size grows, large language models (LLMs) have exhibited human-like conversation ability. This advancement opens the door to a wave of new applications, such as custom AI agents. To achieve this, two essential steps are involved: fine-tuning and serving. Fine-tuning is the process of adapting the LLM to a specific task, such as understanding and responding to domain-specific inquiries. The second step, serving, is about generating outputs to the questions in real-time. However, both of these two steps are hard and expensive due to the large model scale, limiting their accessibility to most of the users. Similarly, to improve efficiency in fine-tuning and serving LLMs, we also employ lossy computation approaches. Our first method enhances memory efficiency in LLM fine-tuning through the use of randomized matrix multiplication. Our second approach introduces a prompt tuning framework that optimizes the accuracy-efficiency trade-off for compressed LLMs. Lastly, we implement an extreme low-bit quantization technique for the KV Cache to further enhance performance.Item PME: pruning-based multi-size embedding for recommender systems(Frontiers Media S.A., 2023) Liu, Zirui; Song, Qingquan; Li, Li; Choi, Soo-Hyun; Chen, Rui; Hu, XiaEmbedding is widely used in recommendation models to learn feature representations. However, the traditional embedding technique that assigns a fixed size to all categorical features may be suboptimal due to the following reasons. In recommendation domain, the majority of categorical features' embeddings can be trained with less capacity without impacting model performance, thereby storing embeddings with equal length may incur unnecessary memory usage. Existing work that tries to allocate customized sizes for each feature usually either simply scales the embedding size with feature's popularity or formulates this size allocation problem as an architecture selection problem. Unfortunately, most of these methods either have large performance drop or incur significant extra time cost for searching proper embedding sizes. In this article, instead of formulating the size allocation problem as an architecture selection problem, we approach the problem from a pruning perspective and propose Pruning-based Multi-size Embedding (PME) framework. During the search phase, we prune the dimensions that have the least impact on model performance in the embedding to reduce its capacity. Then, we show that the customized size of each token can be obtained by transferring the capacity of its pruned embedding with significant less search cost. Experimental results validate that PME can efficiently find proper sizes and hence achieve strong performance while significantly reducing the number of parameters in the embedding layer.