Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition

Abstract

While the majority of FSL models focus on image classification, the extension to action recognition is rather challenging due to the additional temporal dimension in videos. To address this issue, we propose an end-to-end Motion-modulated Temporal Fragment Alignment Network (MTFAN) by jointly exploring the task-specific motion modulation and the multi-level temporal fragment alignment for Few-Shot Action Recognition (FSAR). The proposed MTFAN model enjoys several merits. First, we design a motion modulator conditioned on the learned task-specific motion embeddings, which can activate the channels related to the task-shared motion patterns for each frame. Second, a segment attention mechanism is proposed to automatically discover the higher-level segments for multi-level temporal fragment alignment, which encompasses the frame-to-frame, segment-to-segment, and segment-to-frame alignments. To the best of our knowledge, this is the first work to exploit task-specific motion modulation for FSAR. Extensive experimental results on four standard benchmarks demonstrate that the proposed model performs favorably against the state-of-the-art FSAR methods.

Cite

Text

Wu et al. "Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00894

Markdown

[Wu et al. "Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/wu2022cvpr-motionmodulated/) doi:10.1109/CVPR52688.2022.00894

BibTeX

@inproceedings{wu2022cvpr-motionmodulated,
  title     = {{Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition}},
  author    = {Wu, Jiamin and Zhang, Tianzhu and Zhang, Zhe and Wu, Feng and Zhang, Yongdong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {9151-9160},
  doi       = {10.1109/CVPR52688.2022.00894},
  url       = {https://mlanthology.org/cvpr/2022/wu2022cvpr-motionmodulated/}
}