Video-Text Compliance: Activity Verification Based on Natural Language Instructions

Mayoore Jaiswal, H. Peter Hofstee, Valerie Chen, Suvadip Paul, Rogério Feris, Frank Liu, Anupama Jagannathan, Anne Gattiker, Inseok Hwang, Jinho Lee, Matthew Tong, Sahil Dureja, Soham Shah

ICCVW 2019 pp. 1503-1512

doi:10.1109/ICCVW.2019.00188 /iccvw/2019/jaiswal2019iccvw-videotext/

Abstract

We define a new multi-modal compliance problem, which is to determine if the human activity in a given video is in compliance with an associated text instruction. Solutions to the compliance problem could enable automatic compliance checking and efficient feedback in many real-world settings. To this end, we introduce the Video-Text Compliance (VTC) dataset, which contains videos of atomic activities, along with text instructions and compliance labels. The VTC dataset is constructed by an auto-augmentation technique, preserves privacy, and contains over 1.2 million frames. Finally, we present ComplianceNet, a novel end-to-end trainable compliance network that improves the baseline accuracy by 27.5% on average when trained on the VTC dataset. We plan to release the VTC dataset to the community for future research.

ICCVW Semantic Scholar

Cite

Text

Jaiswal et al. "Video-Text Compliance: Activity Verification Based on Natural Language Instructions." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00188

Markdown

[Jaiswal et al. "Video-Text Compliance: Activity Verification Based on Natural Language Instructions." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/jaiswal2019iccvw-videotext/) doi:10.1109/ICCVW.2019.00188

BibTeX

@inproceedings{jaiswal2019iccvw-videotext,
  title     = {{Video-Text Compliance: Activity Verification Based on Natural Language Instructions}},
  author    = {Jaiswal, Mayoore and Hofstee, H. Peter and Chen, Valerie and Paul, Suvadip and Feris, Rogério and Liu, Frank and Jagannathan, Anupama and Gattiker, Anne and Hwang, Inseok and Lee, Jinho and Tong, Matthew and Dureja, Sahil and Shah, Soham},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {1503-1512},
  doi       = {10.1109/ICCVW.2019.00188},
  url       = {https://mlanthology.org/iccvw/2019/jaiswal2019iccvw-videotext/}
}