Fast and Expressive Sketch Structured Transform for Efficient Inference

Saedi, Kimia

Fast and Expressive Sketch Structured Transform for Efficient Inference

Date

2024-12-06

Authors

Saedi, Kimia

Embargo

Abstract

Linear transformations using learned weights are fundamental components of deep learning models. Prior research has shown that dense weight matrices can often be compressed by decomposition, quantization, sparsification, or random parameter sharing without losing accuracy, suggesting the benefit of more efficient transformations. Among variants of weight matrices, structured ones have limitations in expressivity and quality-efficiency tradeoffs. Unstructured matrices are incompatible with modern hardware, leading to slower training and inference. To address these challenges, we propose Sketch Structured Transform (SS1), an expressive and hardware-efficient operator that reduces tensor multiplications and accelerates inference. SS1 leverages random parameter sharing in a block-structured manner, reducing computation while preserving the expressiveness of parameter sharing. We empirically show that SS1 achieves better quality-efficiency tradeoffs than competing variants. Our theoretical analysis also indicates that SS1 can be combined with quantization for further compression, and the experimental results confirm this. Additionally, pre-trained models can be projected using SS1 and finetuned for efficient deployment. Our experiments highlight various applications of the SS1, including (a) Training GPT2 and DLRM models from scratch for faster inference. (b) Finetuning projected BERT models for 1.31× faster inference while maintaining GLUE scores. (c) Proof of concept with Llama-3-8b, showing 1.11× faster wall clock inference using projected SS1 layers without finetuning.

Advisor

Shrivastava, Anshumali

Degree

Master of Science

Type

Thesis

Keywords

Efficiency, Acceleration, Compression, Parameter Sharing

Citable link to this page

https://hdl.handle.net/1911/118230

Collections

Rice University Theses and Dissertations

Full item page