Zero-1-to-3: Zero-Shot One Image to 3D Object
Abstract
We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image. To perform novel view synthesis in this underconstrained setting, we capitalize on the geometric priors that large-scale diffusion models learn about natural images. Our conditional diffusion model uses a synthetic dataset to learn controls of the relative camera viewpoint, which allow new images to be generated of the same object under a specified camera transformation. Even though it is trained on a synthetic dataset, our model retains a strong zero-shot generalization ability to out-of-distribution datasets as well as in-the-wild images, including impressionist paintings. Our viewpoint-conditioned diffusion approach can further be used for the task of 3D reconstruction from a single image. Qualitative and quantitative experiments show that our method significantly outperforms stateof- the-art single-view 3D reconstruction and novel view synthesis models by leveraging Internet-scale pre-training.
Cite
Text
Liu et al. "Zero-1-to-3: Zero-Shot One Image to 3D Object." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00853Markdown
[Liu et al. "Zero-1-to-3: Zero-Shot One Image to 3D Object." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/liu2023iccv-zero1to3/) doi:10.1109/ICCV51070.2023.00853BibTeX
@inproceedings{liu2023iccv-zero1to3,
title = {{Zero-1-to-3: Zero-Shot One Image to 3D Object}},
author = {Liu, Ruoshi and Wu, Rundi and Van Hoorick, Basile and Tokmakov, Pavel and Zakharov, Sergey and Vondrick, Carl},
booktitle = {International Conference on Computer Vision},
year = {2023},
pages = {9298-9309},
doi = {10.1109/ICCV51070.2023.00853},
url = {https://mlanthology.org/iccv/2023/liu2023iccv-zero1to3/}
}