Pippo: High-Resolution Multi-View Humans from a Single Image

Abstract

We present Pippo, a generative model capable of producing 1K resolution dense turnaround videos of a person from a single casually clicked photo. Pippo is a multi-view diffusion transformer and does not require any additional inputs - e.g., a fitted parametric model or camera parameters of the input image. We pre-train Pippo on 3B human images without captions, and conduct multi-view mid-training and post-training on studio captured humans. During mid-training, to quickly absorb the studio dataset, we denoise several (up to 48) views at low-resolution, and encode target cameras coarsely using a shallow MLP. During post-training, we denoise fewer views at high-resolution and use pixel-aligned controls (e.g., Spatial anchor and Plucker rays) to enable 3D consistent generations. At inference, we propose an attention biasing technique that allows Pippo to simultaneously generate greater than 5 times as many views as seen during training. Finally, we also introduce an improved metric to evaluate 3D consistency of multi-view generations, and show that Pippo outperforms existing works on multi-view human generation from a single image.

Cite

Text

Kant et al. "Pippo: High-Resolution Multi-View Humans from a Single Image." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01531

Markdown

[Kant et al. "Pippo: High-Resolution Multi-View Humans from a Single Image." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/kant2025cvpr-pippo/) doi:10.1109/CVPR52734.2025.01531

BibTeX

@inproceedings{kant2025cvpr-pippo,
  title     = {{Pippo: High-Resolution Multi-View Humans from a Single Image}},
  author    = {Kant, Yash and Weber, Ethan and Kim, Jin Kyu and Khirodkar, Rawal and Zhaoen, Su and Martinez, Julieta and Gilitschenski, Igor and Saito, Shunsuke and Bagautdinov, Timur},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {16418-16429},
  doi       = {10.1109/CVPR52734.2025.01531},
  url       = {https://mlanthology.org/cvpr/2025/kant2025cvpr-pippo/}
}