Diffusion Time-Step Curriculum for One Image to 3D Generation
Abstract
Score distillation sampling (SDS) has been widely adopted to overcome the absence of unseen views in reconstructing 3D objects from a single image. It leverages pre-trained 2D diffusion models as teacher to guide the reconstruction of student 3D models. Despite their remarkable success SDS-based methods often encounter geometric artifacts and texture saturation. We find out the crux is the overlooked indiscriminate treatment of diffusion time-steps during optimization: it unreasonably treats the student-teacher knowledge distillation to be equal at all time-steps and thus entangles coarse-grained and fine-grained modeling. Therefore we propose the Diffusion Time-step Curriculum one-image-to-3D pipeline (DTC123) which involves both the teacher and student models collaborating with the time-step curriculum in a coarse-to-fine manner. Extensive experiments on NeRF4 RealFusion15 GSO and Level50 benchmark demonstrate that DTC123 can produce multi-view consistent high-quality and diverse 3D assets. Codes and more generation demos will be released in https://github.com/yxymessi/DTC123.
Cite
Text
Yi et al. "Diffusion Time-Step Curriculum for One Image to 3D Generation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00949Markdown
[Yi et al. "Diffusion Time-Step Curriculum for One Image to 3D Generation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/yi2024cvpr-diffusion/) doi:10.1109/CVPR52733.2024.00949BibTeX
@inproceedings{yi2024cvpr-diffusion,
title = {{Diffusion Time-Step Curriculum for One Image to 3D Generation}},
author = {Yi, Xuanyu and Wu, Zike and Xu, Qingshan and Zhou, Pan and Lim, Joo-Hwee and Zhang, Hanwang},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {9948-9958},
doi = {10.1109/CVPR52733.2024.00949},
url = {https://mlanthology.org/cvpr/2024/yi2024cvpr-diffusion/}
}