Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-on from a Single Image

Abstract

This paper proposes Virtual Fitting Room (VFR), a novel video generative model that produces arbitrarily long virtual try-on videos. Our VFR models long video generation tasks as an auto-regressive, segment-by-segment generation process, eliminating the need for resource-intensive generation and lengthy video data, while providing the flexibility to generate videos of arbitrary length. The key challenges of this task are twofold: ensuring local smoothness between adjacent segments and maintaining global temporal consistency across different segments. To address these challenges, we propose our VFR framework, which ensures smoothness through a prefix video condition and enforces consistency with the anchor video — a 360°-view video that comprehensively captures the human's whole-body appearance. Our VFR generates minute-scale virtual try-on videos with both local smoothness and global temporal consistency under various motions, making it a pioneering work in long virtual try-on video generation. Project Page: https://immortalco.github.io/VirtualFittingRoom/.

Cite

Text

Chen et al. "Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-on from a Single Image." Advances in Neural Information Processing Systems, 2025.

Markdown

[Chen et al. "Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-on from a Single Image." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/chen2025neurips-virtual/)

BibTeX

@inproceedings{chen2025neurips-virtual,
  title     = {{Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-on from a Single Image}},
  author    = {Chen, Jun-Kun and Bansal, Aayush and Vo, Minh Phuoc and Wang, Yu-Xiong},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/chen2025neurips-virtual/}
}