Towards GAN Benchmarks Which Require Generalization
Abstract
For many evaluation metrics commonly used as benchmarks for unconditional image generation, trivially memorizing the training set attains a better score than models which are considered state-of-the-art; we consider this problematic. We clarify a necessary condition for an evaluation metric not to behave this way: estimating the function must require a large sample from the model. In search of such a metric, we turn to neural network divergences (NNDs), which are defined in terms of a neural network trained to distinguish between distributions. The resulting benchmarks cannot be ``won'' by training set memorization, while still being perceptually correlated and computable only from samples. We survey past work on using NNDs for evaluation, implement an example black-box metric based on these ideas, and validate experimentally that it can measure a notion of generalization.
Cite
Text
Gulrajani et al. "Towards GAN Benchmarks Which Require Generalization." International Conference on Learning Representations, 2019.Markdown
[Gulrajani et al. "Towards GAN Benchmarks Which Require Generalization." International Conference on Learning Representations, 2019.](https://mlanthology.org/iclr/2019/gulrajani2019iclr-gan/)BibTeX
@inproceedings{gulrajani2019iclr-gan,
title = {{Towards GAN Benchmarks Which Require Generalization}},
author = {Gulrajani, Ishaan and Raffel, Colin and Metz, Luke},
booktitle = {International Conference on Learning Representations},
year = {2019},
url = {https://mlanthology.org/iclr/2019/gulrajani2019iclr-gan/}
}