Entity-Aware and Motion-Aware Transformers for Language-Driven Action Localization
Abstract
Language-driven action localization in videos is a challenging task that involves not only visual-linguistic matching but also action boundary prediction. Recent progress has been achieved through aligning language queries to video segments, but estimating precise boundaries is still under-explored. In this paper, we propose entity-aware and motion-aware Transformers that progressively localize actions in videos by first coarsely locating clips with entity queries and then finely predicting exact boundaries in a shrunken temporal region with motion queries. The entity-aware Transformer incorporates the textual entities into visual representation learning via cross-modal and cross-frame attentions to facilitate attending action-related video clips. The motion-aware Transformer captures fine-grained motion changes at multiple temporal scales via integrating long short-term memory into the self-attention module to further improve the precision of action boundary prediction. Extensive experiments on the Charades-STA and TACoS datasets demonstrate that our method achieves better performance than existing methods.
Cite
Text
Yang and Wu. "Entity-Aware and Motion-Aware Transformers for Language-Driven Action Localization." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/216Markdown
[Yang and Wu. "Entity-Aware and Motion-Aware Transformers for Language-Driven Action Localization." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/yang2022ijcai-entity/) doi:10.24963/IJCAI.2022/216BibTeX
@inproceedings{yang2022ijcai-entity,
title = {{Entity-Aware and Motion-Aware Transformers for Language-Driven Action Localization}},
author = {Yang, Shuo and Wu, Xinxiao},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2022},
pages = {1552-1558},
doi = {10.24963/IJCAI.2022/216},
url = {https://mlanthology.org/ijcai/2022/yang2022ijcai-entity/}
}