Mitigating Transformer Overconfidence via Lipschitz Regularization
Abstract
Though Transformers have achieved promising results in many computer vision tasks, they tend to be over-confident in predictions, as the standard Dot Product Self-Attention (DPSA) can barely preserve distance for the unbounded input domain. In this work, we fill this gap by proposing a novel Lipschitz Regularized Transformer (LRFormer). Specifically, we present a new similarity function with the distance within Banach Space to ensure the Lipschitzness and also regularize the term by a contractive Lipschitz Bound. The proposed method is analyzed with a theoretical guarantee, providing a rigorous basis for its effectiveness and reliability. Extensive experiments conducted on standard vision benchmarks demonstrate that our method outperforms the state-of-the-art single forward pass approaches in prediction, calibration, and uncertainty estimation.
Cite
Text
Ye et al. "Mitigating Transformer Overconfidence via Lipschitz Regularization." Uncertainty in Artificial Intelligence, 2023.Markdown
[Ye et al. "Mitigating Transformer Overconfidence via Lipschitz Regularization." Uncertainty in Artificial Intelligence, 2023.](https://mlanthology.org/uai/2023/ye2023uai-mitigating/)BibTeX
@inproceedings{ye2023uai-mitigating,
title = {{Mitigating Transformer Overconfidence via Lipschitz Regularization}},
author = {Ye, Wenqian and Ma, Yunsheng and Cao, Xu and Tang, Kun},
booktitle = {Uncertainty in Artificial Intelligence},
year = {2023},
pages = {2422-2432},
volume = {216},
url = {https://mlanthology.org/uai/2023/ye2023uai-mitigating/}
}