Unsupervised Causal Generative Understanding of Images

Abstract

We present a novel framework for unsupervised object-centric 3D scene understanding that generalizes robustly to out-of-distribution images. To achieve this, we design a causal generative model reflecting the physical process by which an image is produced, when a camera captures a scene containing multiple objects. This model is trained to reconstruct multi-view images via a latent representation describing the shapes, colours and positions of the 3D objects they show. It explicitly represents object instances as separate neural radiance fields, placed into a 3D scene. We then propose an inference algorithm that can infer this latent representation given a single out-of-distribution image as input -- even when it shows an unseen combination of components, unseen spatial compositions or a radically new viewpoint. We conduct extensive experiments applying our approach to test datasets that have zero probability under the training distribution. These show that it accurately reconstructs a scene's geometry, segments objects and infers their positions, despite not receiving any supervision. Our approach significantly out-performs baselines that do not capture the true causal image generation process.

Cite

Text

Anciukevicius et al. "Unsupervised Causal Generative Understanding of Images." Neural Information Processing Systems, 2022.

Markdown

[Anciukevicius et al. "Unsupervised Causal Generative Understanding of Images." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/anciukevicius2022neurips-unsupervised/)

BibTeX

@inproceedings{anciukevicius2022neurips-unsupervised,
  title     = {{Unsupervised Causal Generative Understanding of Images}},
  author    = {Anciukevicius, Titas and Fox-Roberts, Patrick and Rosten, Edward and Henderson, Paul},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/anciukevicius2022neurips-unsupervised/}
}