Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data

Abstract

We present Viewset Diffusion, a diffusion-based generator that outputs 3D objects while only using multi-view 2D data for supervision. We note that there exists a one-to-one mapping between viewsets, i.e., collections of several 2D views of an object, and 3D models. Hence, we train a diffusion model to generate viewsets, but design the neural network generator to reconstruct internally corresponding 3D models, thus generating those too. We fit a diffusion model to a large number of viewsets for a given category of objects. The resulting generator can be conditioned on zero, one or more input views. Conditioned on a single view, it performs 3D reconstruction accounting for the ambiguity of the task and allowing to sample multiple solutions compatible with the input. The model performs reconstruction efficiently, in a feed-forward manner, and is trained using only rendering losses using as few as three views per viewset. Project page: szymanowiczs.github.io/viewset-diffusion

Cite

Text

Szymanowicz et al. "Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00814

Markdown

[Szymanowicz et al. "Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/szymanowicz2023iccv-viewset/) doi:10.1109/ICCV51070.2023.00814

BibTeX

@inproceedings{szymanowicz2023iccv-viewset,
  title     = {{Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data}},
  author    = {Szymanowicz, Stanislaw and Rupprecht, Christian and Vedaldi, Andrea},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {8863-8873},
  doi       = {10.1109/ICCV51070.2023.00814},
  url       = {https://mlanthology.org/iccv/2023/szymanowicz2023iccv-viewset/}
}