Heatmap Regression Without Soft-Argmax for Facial Landmark Detection
Abstract
Facial landmark detection is an important task in computer vision with numerous downstream applications, such as head pose estimation, expression analysis, face swapping, etc. Heatmap regression-based methods have been a strong contender in achieving state-of-the-art results in this task. These methods involve computing the argmax over the heatmaps to predict a landmark. As argmax is not differentiable, to enable end-to-end training on deep-nets, these methods rely on a differentiable approximation of argmax, namely Soft-argmax. In this work, we revisit this long-standing choice of using Soft-argmax and find that it may not be necessary. Instead, we propose an alternative training objective based on the classic structured prediction framework. Empirically, our method achieves state-of-the-art performance on three facial landmark benchmarks (WFLW, COFW, and 300W) with faster training convergence by roughly 2.2xwhile maintaining intuitive design choices in our model.
Cite
Text
Yang and Yeh. "Heatmap Regression Without Soft-Argmax for Facial Landmark Detection." International Conference on Computer Vision, 2025.Markdown
[Yang and Yeh. "Heatmap Regression Without Soft-Argmax for Facial Landmark Detection." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/yang2025iccv-heatmap/)BibTeX
@inproceedings{yang2025iccv-heatmap,
title = {{Heatmap Regression Without Soft-Argmax for Facial Landmark Detection}},
author = {Yang, Chiao-An and Yeh, Raymond A.},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {28729-28739},
url = {https://mlanthology.org/iccv/2025/yang2025iccv-heatmap/}
}