Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models

Abstract

Systematic compositionality, or the ability to adapt to novel situations by creating a mental model of the world using reusable pieces of knowledge, remains a significant challenge in machine learning. While there has been considerable progress in the language domain, efforts towards systematic visual imagination, or envisioning the dynamical implications of a visual observation, are in their infancy. We introduce the Systematic Visual Imagination Benchmark (SVIB), the first benchmark designed to address this problem head-on. SVIB offers a novel framework for a minimal world modeling problem, where models are evaluated based on their ability to generate one-step image-to-image transformations under a latent world dynamics. The framework provides benefits such as the possibility to jointly optimize for systematic perception and imagination, a range of difficulty levels, and the ability to control the fraction of possible factor combinations used during training. We provide a comprehensive evaluation of various baseline models on SVIB, offering insight into the current state-of-the-art in systematic visual imagination. We hope that this benchmark will help advance visual systematic compositionality.

Cite

Text

Kim et al. "Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models." Neural Information Processing Systems, 2023.

Markdown

[Kim et al. "Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/kim2023neurips-imagine/)

BibTeX

@inproceedings{kim2023neurips-imagine,
  title     = {{Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models}},
  author    = {Kim, Yeongbin and Singh, Gautam and Park, Junyeong and Gulcehre, Caglar and Ahn, Sungjin},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/kim2023neurips-imagine/}
}