Separation Results Between Fixed-Kernel and Feature-Learning Probability Metrics

Abstract

Several works in implicit and explicit generative modeling empirically observed that feature-learning discriminators outperform fixed-kernel discriminators in terms of the sample quality of the models. We provide separation results between probability metrics with fixed-kernel and feature-learning discriminators using the function classes $\mathcal{F}_2$ and $\mathcal{F}_1$ respectively, which were developed to study overparametrized two-layer neural networks. In particular, we construct pairs of distributions over hyper-spheres that can not be discriminated by fixed kernel $(\mathcal{F}_2)$ integral probability metric (IPM) and Stein discrepancy (SD) in high dimensions, but that can be discriminated by their feature learning ($\mathcal{F}_1$) counterparts. To further study the separation we provide links between the $\mathcal{F}_1$ and $\mathcal{F}_2$ IPMs with sliced Wasserstein distances. Our work suggests that fixed-kernel discriminators perform worse than their feature learning counterparts because their corresponding metrics are weaker.

Cite

Text

Enrich and Mroueh. "Separation Results Between Fixed-Kernel and Feature-Learning Probability Metrics." Neural Information Processing Systems, 2021.

Markdown

[Enrich and Mroueh. "Separation Results Between Fixed-Kernel and Feature-Learning Probability Metrics." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/enrich2021neurips-separation/)

BibTeX

@inproceedings{enrich2021neurips-separation,
  title     = {{Separation Results Between Fixed-Kernel and Feature-Learning Probability Metrics}},
  author    = {Enrich, Carles Domingo i and Mroueh, Youssef},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/enrich2021neurips-separation/}
}