Learning with a Wasserstein Loss

Abstract

Learning to predict multi-label outputs is challenging, but in many problems there is a natural metric on the outputs that can be used to improve predictions. In this paper we develop a loss function for multi-label learning, based on the Wasserstein distance. The Wasserstein distance provides a natural notion of dissimilarity for probability measures. Although optimizing with respect to the exact Wasserstein distance is costly, recent work has described a regularized approximation that is efficiently computed. We describe an efficient learning algorithm based on this regularization, as well as a novel extension of the Wasserstein distance from probability measures to unnormalized measures. We also describe a statistical learning bound for the loss. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data tag prediction problem, using the Yahoo Flickr Creative Commons dataset, outperforming a baseline that doesn't use the metric.

Cite

Text

Frogner et al. "Learning with a Wasserstein Loss." Neural Information Processing Systems, 2015.

Markdown

[Frogner et al. "Learning with a Wasserstein Loss." Neural Information Processing Systems, 2015.](https://mlanthology.org/neurips/2015/frogner2015neurips-learning/)

BibTeX

@inproceedings{frogner2015neurips-learning,
  title     = {{Learning with a Wasserstein Loss}},
  author    = {Frogner, Charlie and Zhang, Chiyuan and Mobahi, Hossein and Araya, Mauricio and Poggio, Tomaso A},
  booktitle = {Neural Information Processing Systems},
  year      = {2015},
  pages     = {2053-2061},
  url       = {https://mlanthology.org/neurips/2015/frogner2015neurips-learning/}
}