Multi-Entity Video Transformers for Fine-Grained Video Representation Learning

Cite

Text

Walmer et al. "Multi-Entity Video Transformers for Fine-Grained Video Representation Learning." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Walmer et al. "Multi-Entity Video Transformers for Fine-Grained Video Representation Learning." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/walmer2025cvprw-multientity/)

BibTeX

@inproceedings{walmer2025cvprw-multientity,
  title     = {{Multi-Entity Video Transformers for Fine-Grained Video Representation Learning}},
  author    = {Walmer, Matthew and Kanjirathinkal, Rose Catherine and Tai, Kai Sheng and Muzumdar, Keyur and Tian, Tai-Peng and Shrivastava, Abhinav},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {2110-2120},
  url       = {https://mlanthology.org/cvprw/2025/walmer2025cvprw-multientity/}
}