Moreau-Yosida $f$-Divergences

Abstract

Variational representations of $f$-divergences are central to many machine learning algorithms, with Lipschitz constrained variants recently gaining attention. Inspired by this, we define the Moreau-Yosida approximation of $f$-divergences with respect to the Wasserstein-$1$ metric. The corresponding variational formulas provide a generalization of a number of recent results, novel special cases of interest and a relaxation of the hard Lipschitz constraint. Additionally, we prove that the so-called tight variational representation of $f$-divergences can be to be taken over the quotient space of Lipschitz functions, and give a characterization of functions achieving the supremum in the variational representation. On the practical side, we propose an algorithm to calculate the tight convex conjugate of $f$-divergences compatible with automatic differentiation frameworks. As an application of our results, we propose the Moreau-Yosida $f$-GAN, providing an implementation of the variational formulas for the Kullback-Leibler, reverse Kullback-Leibler, $\chi^2$, reverse $\chi^2$, squared Hellinger, Jensen-Shannon, Jeffreys, triangular discrimination and total variation divergences as GANs trained on CIFAR-10, leading to competitive results and a simple solution to the problem of uniqueness of the optimal critic.

Cite

Text

Terjék. "Moreau-Yosida $f$-Divergences." International Conference on Machine Learning, 2021.

Markdown

[Terjék. "Moreau-Yosida $f$-Divergences." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/terjek2021icml-moreauyosida/)

BibTeX

@inproceedings{terjek2021icml-moreauyosida,
  title     = {{Moreau-Yosida $f$-Divergences}},
  author    = {Terjék, Dávid},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {10214-10224},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/terjek2021icml-moreauyosida/}
}