Building a Multi-Modal Spatiotemporal Expert for Zero-Shot Action Recognition with CLIP

Cite

Text

Yu et al. "Building a Multi-Modal Spatiotemporal Expert for Zero-Shot Action Recognition with CLIP." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I9.33050

Markdown

[Yu et al. "Building a Multi-Modal Spatiotemporal Expert for Zero-Shot Action Recognition with CLIP." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/yu2025aaai-building/) doi:10.1609/AAAI.V39I9.33050

BibTeX

@inproceedings{yu2025aaai-building,
  title     = {{Building a Multi-Modal Spatiotemporal Expert for Zero-Shot Action Recognition with CLIP}},
  author    = {Yu, Yating and Cao, Congqi and Zhang, Yueran and Lv, Qinyi and Min, Lingtong and Zhang, Yanning},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {9689-9697},
  doi       = {10.1609/AAAI.V39I9.33050},
  url       = {https://mlanthology.org/aaai/2025/yu2025aaai-building/}
}