Perspectives on Algorithmic, Structural, and Pragmatic Acceleration Techniques in Machine Learning and Quantum Computing
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In the modern era of big data and emerging quantum computing, the need to process information at an extreme scale has become inevitable. For instance, GPT-3, a popular large language model, was trained on a corpus of text with 45 terabytes of data with 175 billion model parameters \cite{brown2020language}. Similarly, in quantum computing, the number of free parameters that define quantum states and processes scale exponentially with the number of subsystems (i.e., qubits), often rendering naive methods, such as convex programming, inapplicable even with a moderate number of qubits \cite{ladd2010quantum}. In this two-part thesis, we explore various perspectives to achieve “acceleration” in such computationally challenging scenarios, both theoretically and empirically, with applications in quantum computing and machine learning. Specifically, we consider (i) algorithmic acceleration via the momentum technique (e.g., Polyak’s momentum or Nesterov’s accelerated method); (ii) model structural acceleration (i.e., by decreasing the degree of freedom to be inferred) via low-rank approximation; and (iii) acceleration for practitioners via principled hyperparameter recipes and distributed/federated protocols.
Part one explores the problem of quantum state tomography (QST), which is formulated as a nonconvex optimization problem. QST is the canonical procedure to identify the nature of imperfections in implementing quantum processing units (QPUs) and, eventually, building a fault-tolerant quantum computer. The main computational bottleneck in QST is that the parameter space (i.e., optimization) and the number of measurements required (i.e., sample complexity) both exponentially increase.
Two novel QST methods are proposed: (i) a centralized non-convex method that combines ideas from matrix factorization, compressive sensing, and Nesterov’s acceleration that can drastically decrease the optimization and sample complexities; and (ii) an extension of the aforementioned method to a distributed one that can utilize a set of classical local machines communicating with a central quantum server, which can be suitable for the noisy intermediate-scale quantum (NISQ) era.
Part two explores achieving pragmatic acceleration in modern machine learning systems, which nowadays involve billions of parameters with increasingly convoluted objective functions, such as distributed objectives and game-theoretic formulations. As such, optimizing the model parameters has increasingly become a time- and computing-intensive task, where practitioners often rely on expensive grid searches with numerous rounds of retraining.
Within part two, the first contribution studies the stability and acceleration of the stochastic proximal point method with momentum; the modified proximal operator provides incredible robustness with hyperparameter misspecification while enjoying accelerated convergence. The second contribution proposes an adaptive step size scheme for stochastic gradient descent, in the context of federated learning, based on the approximation of the local smoothness of the individual function that each client optimizes. Finally, the third contribution explores acceleration in smooth games, and identifies three cases of game Jacobian eigenvalue distribution where the momentum extragradient method exhibits accelerated convergence rates, along with the optimal hyperparameters for each scenario.
Description
Advisor
Degree
Type
Keywords
Citation
Kim, Junhyung. Perspectives on Algorithmic, Structural, and Pragmatic Acceleration Techniques in Machine Learning and Quantum Computing. (2024). PhD diss., Rice University. https://hdl.handle.net/1911/117753