Versatile Transition Generation with Image-to-Video Diffusion

Abstract

Leveraging text, images, structure maps, or motion trajectories as conditional guidance, diffusion models have achieved great success in automated and high-quality video generation. However, generating smooth and rational transition videos given the first and last video frames as well as descriptive text prompts is far underexplored. We present VTG, a Versatile Transition video Generation framework that can generate smooth, high-fidelity, and semantic-coherent video transitions. VTG introduces interpolation-based initialization that helps preserve object identity and handle abrupt content changes effectively. In addition, it incorporates dual-directional motion fine-tuning and representation alignment regularization to mitigate the limitations of pre-trained image-to-video diffusion models in motion smoothness and generation fidelity, respectively. To evaluate VTG and facilitate future studies on unified transition generation, we collected TransitBench, a comprehensive benchmark for transition generation covering two representative transition tasks: concept blending and scene transition. Extensive experiments show that VTG achieves superior transition performance consistently across all four tasks.

Cite

Text

Yang et al. "Versatile Transition Generation with Image-to-Video Diffusion." International Conference on Computer Vision, 2025.

Markdown

[Yang et al. "Versatile Transition Generation with Image-to-Video Diffusion." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/yang2025iccv-versatile/)

BibTeX

@inproceedings{yang2025iccv-versatile,
  title     = {{Versatile Transition Generation with Image-to-Video Diffusion}},
  author    = {Yang, Zuhao and Zhang, Jiahui and Yu, Yingchen and Lu, Shijian and Bai, Song},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {16981-16990},
  url       = {https://mlanthology.org/iccv/2025/yang2025iccv-versatile/}
}