Embedding Adaptation Network with Transformer for Few-Shot Action Recognition
Abstract
Few-shot action recognition aims to classify novel action categories using a few training samples. Most current few-shot action recognition methods via episodic training strategy mainly use the same normalization method to normalize feature embeddings, leading to limited performance when the batch size is small. And some methods learn feature embeddings individually without considering the whole task, neglecting important interactive information between videos in the current episode. To address these problems, we propose a novel embedding adaptation network with Transformer (EANT) for few-shot action recognition. Specifically, we first propose an improved self-guided instance normalization (SGIN) module to adaptively learn class-specific feature embeddings in an input-dependent manner. Built upon the learned feature embeddings, we design a Transformer-based embedding learning (TEL) module to learn task-specific feature embeddings by fully capturing rich information cross videos in each episodic task. Furthermore, we utilize semantic knowledge among all sampled training classes as additional supervisory information to improve the generalization ability of the network. By this means, the proposed EANT can be highly effective and informative for few-shot action recognition. Extensive experiments conducted on several challenging few-shot action recognition benchmarks show that the proposed EANT outperforms several state-of-the-art methods by a large margin.
Cite
Text
Jin et al. "Embedding Adaptation Network with Transformer for Few-Shot Action Recognition." Proceedings of The 14th Asian Conference on Machine Learning, 2022.Markdown
[Jin et al. "Embedding Adaptation Network with Transformer for Few-Shot Action Recognition." Proceedings of The 14th Asian Conference on Machine Learning, 2022.](https://mlanthology.org/acml/2022/jin2022acml-embedding/)BibTeX
@inproceedings{jin2022acml-embedding,
title = {{Embedding Adaptation Network with Transformer for Few-Shot Action Recognition}},
author = {Jin, Rongrong and Wang, Xiao and Wang, Guangge and Lu, Yang and Hu, Hai-Miao and Wang, Hanzi},
booktitle = {Proceedings of The 14th Asian Conference on Machine Learning},
year = {2022},
pages = {515-530},
volume = {189},
url = {https://mlanthology.org/acml/2022/jin2022acml-embedding/}
}