SoftMax Is $1/2$-Lipschitz: A Tight Bound Across All $\ell_p$ Norms

Abstract

The softmax function is a basic operator in machine learning and optimization, used in classification, attention mechanisms, reinforcement learning, game theory, and problems involving log-sum-exp terms. Existing robustness guarantees of learning models and convergence analysis of optimization algorithms typically consider the softmax operator to have a Lipschitz constant of $1$ with respect to the $\ell_2$ norm. In this work, we prove that the softmax function is contractive with the Lipschitz constant $1/2$, uniformly across all $\ell_p$ norms with $p \ge 1$. We also show that the local Lipschitz constant of softmax attains $1/2$ for $p = 1$ and $p = \infty$, and for $p \in (1,\infty)$, the constant remains strictly below $1/2$ and the supremum $1/2$ is achieved only in the limit. To our knowledge, this is the first comprehensive norm-uniform analysis of softmax Lipschitz continuity. We demonstrate how the sharper constant directly improves a range of existing theoretical results on robustness and convergence. We further validate the sharpness of the $1/2$ Lipschitz constant of the softmax operator through empirical studies on attention-based architectures (ViT, GPT-2, Qwen3-8B) and on stochastic policies in reinforcement learning. TL;DR: We show that the softmax operator is $1/2$-Lipschitz (contractive) over all $\ell_p$ norms ($p \ge 1$), and characterize the tightness of this bound. We validate the constant empirically on modern attention architectures and stochastic RL policies, and demonstrate how the sharper Lipschitz bound improves existing robustness and optimization guarantees.

Cite

Text

Nair. "SoftMax Is $1/2$-Lipschitz: A Tight Bound Across All $\ell_p$ Norms." Transactions on Machine Learning Research, 2026.

Markdown

[Nair. "SoftMax Is $1/2$-Lipschitz: A Tight Bound Across All $\ell_p$ Norms." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/nair2026tmlr-softmax/)

BibTeX

@article{nair2026tmlr-softmax,
  title     = {{SoftMax Is $1/2$-Lipschitz: A Tight Bound Across All $\ell_p$ Norms}},
  author    = {Nair, Pravin},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/nair2026tmlr-softmax/}
}