The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image
Abstract
What can neural networks learn about the visual world when provided with only a single image as input? While any image obviously cannot contain the multitudes of all existing objects, scenes and lighting conditions -- within the space of all $256^{3\cdot224\cdot224}$ possible $224$-sized square images, it might still provide a strong prior for natural images. To analyze this ``augmented image prior'' hypothesis, we develop a simple framework for training neural networks from scratch using a single image and augmentations using knowledge distillation from a supervised pretrained teacher. With this, we find the answer to the above question to be: `surprisingly, a lot'. In quantitative terms, we find accuracies of $94\%$/$74\%$ on CIFAR-10/100, $69$\% on ImageNet, and by extending this method to video and audio, $51\%$ on Kinetics-400 and $84$\% on SpeechCommands. In extensive analyses spanning 13 datasets, we disentangle the effect of augmentations, choice of data and network architectures and also provide qualitative evaluations that include lucid ``panda neurons'' in networks that have never even seen one.
Cite
Text
Asano and Saeed. "The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image." International Conference on Learning Representations, 2023.Markdown
[Asano and Saeed. "The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/asano2023iclr-augmented/)BibTeX
@inproceedings{asano2023iclr-augmented,
title = {{The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image}},
author = {Asano, Yuki M and Saeed, Aaqib},
booktitle = {International Conference on Learning Representations},
year = {2023},
url = {https://mlanthology.org/iclr/2023/asano2023iclr-augmented/}
}