TOSS: High-Quality Text-Guided Novel View Synthesis from a Single Image

Abstract

In this paper, we present TOSS, which introduces text to the task of novel view synthesis (NVS) from just a single RGB image. While Zero123 has demonstrated impressive zero-shot open-set NVS capabilities, it treats NVS as a pure image-to-image translation problem. This approach suffers from the challengingly under-constrained nature of single-view NVS: the process lacks means of explicit user control and often result in implausible NVS generations. To address this limitation, TOSS uses text as high-level semantic information to constrain the NVS solution space. TOSS fine-tunes text-to-image Stable Diffusion pre-trained on large-scale text-image pairs and introduces modules specifically tailored to image and camera pose conditioning, as well as dedicated training for pose correctness and preservation of fine details. Comprehensive experiments are conducted with results showing that our proposed TOSS outperforms Zero123 with higher-quality NVS results and faster convergence. We further support these results with comprehensive ablations that underscore the effectiveness and potential of the introduced semantic guidance and architecture design.

Cite

Text

Shi et al. "TOSS: High-Quality Text-Guided Novel View Synthesis from a Single Image." International Conference on Learning Representations, 2024.

Markdown

[Shi et al. "TOSS: High-Quality Text-Guided Novel View Synthesis from a Single Image." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/shi2024iclr-toss/)

BibTeX

@inproceedings{shi2024iclr-toss,
  title     = {{TOSS: High-Quality Text-Guided Novel View Synthesis from a Single Image}},
  author    = {Shi, Yukai and Wang, Jianan and Cao, He and Tang, Boshi and Qi, Xianbiao and Yang, Tianyu and Huang, Yukun and Liu, Shilong and Zhang, Lei and Shum, Heung-Yeung},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/shi2024iclr-toss/}
}