Multi-Modal Domain Adaptation for Fine-Grained Action Recognition
Abstract
Fine-grained action recognition datasets exhibit environmental bias, where even the largest datasets contain sequences from a limited number of environments due to the challenges of large-scale data collection. We show that multi-modal action recognition models suffer with changes in environment, due to the differing levels of robustness of each modality. Inspired by successes in adversarial training for unsupervised domain adaptation, we propose a multi-modal approach for adapting action recognition models to novel environments. We employ late fusion of the two modalities commonly used in action recognition (RGB and Flow), with multiple domain discriminators, so alignment of modalities is jointly optimised with recognition. We test our approach on EPIC Kitchens, proposing the first benchmark for domain adaptation of fine-grained actions. Our multi-modal method outperforms single-modality alignment as well as other alignment methods by up to 3%.
Cite
Text
Munro and Damen. "Multi-Modal Domain Adaptation for Fine-Grained Action Recognition." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00461Markdown
[Munro and Damen. "Multi-Modal Domain Adaptation for Fine-Grained Action Recognition." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/munro2019iccvw-multimodal/) doi:10.1109/ICCVW.2019.00461BibTeX
@inproceedings{munro2019iccvw-multimodal,
title = {{Multi-Modal Domain Adaptation for Fine-Grained Action Recognition}},
author = {Munro, Jonathan and Damen, Dima},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2019},
pages = {3723-3726},
doi = {10.1109/ICCVW.2019.00461},
url = {https://mlanthology.org/iccvw/2019/munro2019iccvw-multimodal/}
}