Fast and Expressive Sketch Structured Transform for Efficient Inference

dc.contributor.advisorShrivastava, Anshumalien_US
dc.creatorSaedi, Kimiaen_US
dc.date.accessioned2025-01-17T17:21:58Zen_US
dc.date.created2024-12en_US
dc.date.issued2024-12-06en_US
dc.date.submittedDecember 2024en_US
dc.date.updated2025-01-17T17:21:58Zen_US
dc.description.abstractLinear transformations using learned weights are fundamental components of deep learning models. Prior research has shown that dense weight matrices can often be compressed by decomposition, quantization, sparsification, or random parameter sharing without losing accuracy, suggesting the benefit of more efficient transformations. Among variants of weight matrices, structured ones have limitations in expressivity and quality-efficiency tradeoffs. Unstructured matrices are incompatible with modern hardware, leading to slower training and inference. To address these challenges, we propose Sketch Structured Transform (SS1), an expressive and hardware-efficient operator that reduces tensor multiplications and accelerates inference. SS1 leverages random parameter sharing in a block-structured manner, reducing computation while preserving the expressiveness of parameter sharing. We empirically show that SS1 achieves better quality-efficiency tradeoffs than competing variants. Our theoretical analysis also indicates that SS1 can be combined with quantization for further compression, and the experimental results confirm this. Additionally, pre-trained models can be projected using SS1 and finetuned for efficient deployment. Our experiments highlight various applications of the SS1, including (a) Training GPT2 and DLRM models from scratch for faster inference. (b) Finetuning projected BERT models for 1.31× faster inference while maintaining GLUE scores. (c) Proof of concept with Llama-3-8b, showing 1.11× faster wall clock inference using projected SS1 layers without finetuning.en_US
dc.embargo.lift2025-06-01en_US
dc.embargo.terms2025-06-01en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.urihttps://hdl.handle.net/1911/118230en_US
dc.language.isoenen_US
dc.subjectEfficiency, Acceleration, Compression, Parameter Sharingen_US
dc.titleFast and Expressive Sketch Structured Transform for Efficient Inferenceen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Scienceen_US
Files
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.98 KB
Format:
Plain Text
Description: