Towards Exact Gradient-Based Training on Analog In-Memory Computing
Abstract
Given the high economic and environmental costs of using large vision or language models, analog in-memory accelerators present a promising solution for energy-efficient AI. While inference on analog accelerators has been studied recently, the training perspective is underexplored. Recent studies have shown that the "workhorse" of digital AI training - stochastic gradient descent (SGD) algorithm converges inexactly when applied to model training on non-ideal devices. This paper puts forth a theoretical foundation for gradient-based training on analog devices. We begin by characterizing the non-convergent issue of SGD, which is caused by the asymmetric updates on the analog devices. We then provide a lower bound of the asymptotic error to show that there is a fundamental performance limit of SGD-based analog training rather than an artifact of our analysis. To address this issue, we study a heuristic analog algorithm called Tiki-Taka that has recently exhibited superior empirical performance compared to SGD. We rigorously show its ability to converge to a critical point exactly and hence eliminate the asymptotic error. The simulations verify the correctness of the analyses.
Cite
Text
Wu et al. "Towards Exact Gradient-Based Training on Analog In-Memory Computing." Neural Information Processing Systems, 2024. doi:10.52202/079017-1176Markdown
[Wu et al. "Towards Exact Gradient-Based Training on Analog In-Memory Computing." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/wu2024neurips-exact/) doi:10.52202/079017-1176BibTeX
@inproceedings{wu2024neurips-exact,
title = {{Towards Exact Gradient-Based Training on Analog In-Memory Computing}},
author = {Wu, Zhaoxian and Gokmen, Tayfun and Rasch, Malte J. and Chen, Tianyi},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-1176},
url = {https://mlanthology.org/neurips/2024/wu2024neurips-exact/}
}