Linear Time Sinkhorn Divergences Using Positive Features

Abstract

Although Sinkhorn divergences are now routinely used in data sciences to compare probability distributions, the computational effort required to compute them remains expensive, growing in general quadratically in the size $n$ of the support of these distributions. Indeed, solving optimal transport (OT) with an entropic regularization requires computing a $n\times n$ kernel matrix (the neg-exponential of a $n\times n$ pairwise ground cost matrix) that is repeatedly applied to a vector. We propose to use instead ground costs of the form $c(x,y)=-\log\dotp{\varphi(x)}{\varphi(y)}$ where $\varphi$ is a map from the ground space onto the positive orthant $\RR^r_+$, with $r\ll n$. This choice yields, equivalently, a kernel $k(x,y)=\dotp{\varphi(x)}{\varphi(y)}$, and ensures that the cost of Sinkhorn iterations scales as $O(nr)$. We show that usual cost functions can be approximated using this form. Additionaly, we take advantage of the fact that our approach yields approximation that remain fully differentiable with respect to input distributions, as opposed to previously proposed adaptive low-rank approximations of the kernel matrix, to train a faster variant of OT-GAN~\cite{salimans2018improving}.

Cite

Text

Scetbon and Cuturi. "Linear Time Sinkhorn Divergences Using Positive Features." Neural Information Processing Systems, 2020.

Markdown

[Scetbon and Cuturi. "Linear Time Sinkhorn Divergences Using Positive Features." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/scetbon2020neurips-linear/)

BibTeX

@inproceedings{scetbon2020neurips-linear,
  title     = {{Linear Time Sinkhorn Divergences Using Positive Features}},
  author    = {Scetbon, Meyer and Cuturi, Marco},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/scetbon2020neurips-linear/}
}