IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Abstract

Creating a high-fidelity, animatable 3D full-body avatar from a single image is a challenging task due to the diverse appearance and poses of humans and the limited availability of high-quality training data. To achieve fast and high-quality human reconstruction, this work rethinks the task from the perspectives of dataset, model, and representation. First, we introduce a large-scale HUman GEnerated training dataset, HuGe100K, consisting of 100K diverse, photorealistic human images with corresponding 24-view in a static pose or dynamic pose frames generated via a pose-controllable image-to-video model. Next, leveraging the diversity in views, poses, and appearances within HuGe100K, we develop a scalable feed-forward transformer model to predict a 3D human Gaussian representation in a uniform space of a given human image. This model is trained to disentangle human pose, shape, clothing geometry, and texture. Accordingly, the estimated Gaussians can be animated robustly without post-processing. We conduct comprehensive experiments to validate the effectiveness of the proposed dataset and method. Our model demonstrates the generalizable ability to efficiently reconstruct photorealistic humans in under 1 second using a single GPU. Additionally, it seamlessly supports various applications, including animation, shape, and texture editing tasks.

Cite

Text

Zhuang et al. "IDOL: Instant Photorealistic 3D Human Creation from a Single Image." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02450

Markdown

[Zhuang et al. "IDOL: Instant Photorealistic 3D Human Creation from a Single Image." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/zhuang2025cvpr-idol/) doi:10.1109/CVPR52734.2025.02450

BibTeX

@inproceedings{zhuang2025cvpr-idol,
  title     = {{IDOL: Instant Photorealistic 3D Human Creation from a Single Image}},
  author    = {Zhuang, Yiyu and Lv, Jiaxi and Wen, Hao and Shuai, Qing and Zeng, Ailing and Zhu, Hao and Chen, Shifeng and Yang, Yujiu and Cao, Xun and Liu, Wei},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {26308-26319},
  doi       = {10.1109/CVPR52734.2025.02450},
  url       = {https://mlanthology.org/cvpr/2025/zhuang2025cvpr-idol/}
}