Loss Functions and Operators Generated by F-Divergences

Abstract

The logistic loss (a.k.a. cross-entropy loss) is one of the most popular loss functions used for multiclass classification. It is also the loss function of choice for next-token prediction in language modeling. It is associated with the Kullback-Leibler (KL) divergence and the softargmax operator. In this work, we propose to construct new convex loss functions based on $f$-divergences. Our loss functions generalize the logistic loss in two directions: i) by replacing the KL divergence with $f$-divergences and ii) by allowing non-uniform reference measures. We instantiate our framework for numerous $f$-divergences, recovering existing losses and creating new ones. By analogy with the logistic loss, the loss function generated by an $f$-divergence is associated with an operator, that we dub $f$-softargmax. We derive a novel parallelizable bisection algorithm for computing the $f$-softargmax associated with any $f$-divergence. On the empirical side, one of the goals of this paper is to determine the effectiveness of loss functions beyond the classical cross-entropy in a language model setting, including on pre-training, post-training (SFT) and distillation. We show that the loss function generated by the $\alpha$-divergence (which is equivalent to Tsallis $\alpha$-negentropy in the case of unit reference measures) with $\alpha=1.5$ performs well across several tasks.

Cite

Text

Roulet et al. "Loss Functions and Operators Generated by F-Divergences." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Roulet et al. "Loss Functions and Operators Generated by F-Divergences." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/roulet2025icml-loss/)

BibTeX

@inproceedings{roulet2025icml-loss,
  title     = {{Loss Functions and Operators Generated by F-Divergences}},
  author    = {Roulet, Vincent and Liu, Tianlin and Vieillard, Nino and Sander, Michael Eli and Blondel, Mathieu},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {52110-52138},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/roulet2025icml-loss/}
}