Towards Online Real-Time Memory-Based Video Inpainting Transformers

Abstract

Video inpainting tasks have seen significant improvements in recent years with the rise of deep neural networks and, in particular, vision transformers. Although these models show promising reconstruction quality and temporal consistency, they are still unsuitable for live videos, one of the last steps to make them completely convincing and usable. The main limitations are that these state-of-the-art models inpaint using the whole video (offline processing) and show an insufficient frame rate. In our approach, we propose a framework to adapt existing inpainting transformers to these constraints by memorizing and refining redundant computations while maintaining a decent inpainting quality. Using this framework with some of the most recent inpainting models, we show great online results with a consistent throughput above 20 frames per second.

Cite

Text

Thiry et al. "Towards Online Real-Time Memory-Based Video Inpainting Transformers." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00610

Markdown

[Thiry et al. "Towards Online Real-Time Memory-Based Video Inpainting Transformers." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/thiry2024cvprw-online/) doi:10.1109/CVPRW63382.2024.00610

BibTeX

@inproceedings{thiry2024cvprw-online,
  title     = {{Towards Online Real-Time Memory-Based Video Inpainting Transformers}},
  author    = {Thiry, Guillaume and Tang, Hao and Timofte, Radu and Van Gool, Luc},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {6035-6044},
  doi       = {10.1109/CVPRW63382.2024.00610},
  url       = {https://mlanthology.org/cvprw/2024/thiry2024cvprw-online/}
}