Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views
Abstract
Synthesizing multi-view 3D from one single image is a significant but challenging task. Zero-1-to-3 methods have achieved great success by lifting a 2D latent diffusion model to the 3D scope. The target-view image is generated with a single-view source image and the camera pose as condition information. However, due to the high sparsity of the single input image, Zero-1-to-3 tends to produce geometry and appearance inconsistency across views, especially for complex objects. To tackle this issue, we propose to supply more condition information for the generation model but in a self-prompt way. A cascade framework is constructed with two Zero-1-to-3 models, named , which progressively extract 3D information from the source image. Specifically, several nearby views are first generated by the first model and then fed into the second-stage model along with the source image as generation conditions. With amplified self-prompted condition images, our generates more consistent novel-view images than Zero-1-to-3. Experiment results demonstrate remarkable promotion, especially for various complex and challenging scenes, involving insects, humans, transparent objects, and stacked multiple objects . More demos and code are available at https: //cascadezero123.github.io.
Cite
Text
Chen et al. "Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72940-9_18Markdown
[Chen et al. "Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/chen2024eccv-cascadezero123/) doi:10.1007/978-3-031-72940-9_18BibTeX
@inproceedings{chen2024eccv-cascadezero123,
title = {{Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views}},
author = {Chen, Yabo and Fang, Jiemin and Huang, Yuyang and Yi, Taoran and Zhang, Xiaopeng and Xie, Lingxi and Wang, Xinggang and Dai, Wenrui and Xiong, Hongkai and Tian, Qi},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-72940-9_18},
url = {https://mlanthology.org/eccv/2024/chen2024eccv-cascadezero123/}
}