Adaptive Sampling for Efficient SoftMax Approximation
Abstract
The softmax function is ubiquitous in machine learning and optimization applications. Computing the full softmax evaluation of a matrix-vector product can be computationally expensive in high-dimensional settings. In many applications, however, it is sufficient to calculate only the top few outputs of the softmax function. In this work, we present an algorithm, dubbed AdaptiveSoftmax, that adaptively computes the top k softmax values more efficiently than the full softmax computation, with probabilistic guarantees. We demonstrate the sample efficiency improvements afforded by AdaptiveSoftmax on real and synthetic data to corroborate our theoretical results. AdaptiveSoftmax yields >10x gain over full softmax computation on most datasets, yielding up to 30x improvement for Mistral7B evaluated on the Wikitext dataset. The adaptive method we propose for estimating the partition function (the softmax denominator) is of independent interest and can be used in other applications such as kernel density estimation.
Cite
Text
Baharav et al. "Adaptive Sampling for Efficient SoftMax Approximation." Neural Information Processing Systems, 2024. doi:10.52202/079017-3733Markdown
[Baharav et al. "Adaptive Sampling for Efficient SoftMax Approximation." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/baharav2024neurips-adaptive/) doi:10.52202/079017-3733BibTeX
@inproceedings{baharav2024neurips-adaptive,
title = {{Adaptive Sampling for Efficient SoftMax Approximation}},
author = {Baharav, Tavor Z. and Kang, Ryan and Sullivan, Colin and Tiwari, Mo and Luxenberg, Eric and Tse, David and Pilanci, Mert},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-3733},
url = {https://mlanthology.org/neurips/2024/baharav2024neurips-adaptive/}
}