How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Abstract
We aim to understand how actions are performed and identify subtle differences, such as 'fold firmly' vs. 'fold gently'. To this end, we propose a method which recognizes adverbs across different actions. However, such fine-grained annotations are difficult to obtain and their long-tailed nature makes it challenging to recognize adverbs in rare action-adverb compositions. Our approach therefore uses semi-supervised learning with multiple adverb pseudo-labels to leverage videos with only action labels. Combined with adaptive thresholding of these pseudo-adverbs we are able to make efficient use of the available data while tackling the long-tailed distribution. Additionally, we gather adverb annotations for three existing video retrieval datasets, which allows us to introduce the new tasks of recognizing adverbs in unseen action-adverb compositions and unseen domains. Experiments demonstrate the effectiveness of our method,which outperforms prior work in recognizing adverbs and semi-supervised works adapted for adverb recognition. We also show how adverbs can relate fine-grained actions.
Cite
Text
Doughty and Snoek. "How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.01346Markdown
[Doughty and Snoek. "How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/doughty2022cvpr-you/) doi:10.1109/CVPR52688.2022.01346BibTeX
@inproceedings{doughty2022cvpr-you,
title = {{How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs}},
author = {Doughty, Hazel and Snoek, Cees G. M.},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2022},
pages = {13832-13842},
doi = {10.1109/CVPR52688.2022.01346},
url = {https://mlanthology.org/cvpr/2022/doughty2022cvpr-you/}
}