Oops! Predicting Unintentional Action in Video

Abstract

From just a short glance at a video, we can often tell whether a person's action is intentional or not. Can we train a model to recognize this? We introduce a dataset of in-the-wild videos of unintentional action, as well as a suite of tasks for recognizing, localizing, and anticipating its onset. We train a supervised neural network as a baseline and analyze its performance compared to human consistency on the tasks. We also investigate self-supervised representations that leverage natural signals in our dataset, and show the effectiveness of an approach that uses the intrinsic speed of video to perform competitively with highly-supervised pretraining. However, a significant gap between machine and human performance remains.

Cite

Text

Epstein et al. "Oops! Predicting Unintentional Action in Video." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00100

Markdown

[Epstein et al. "Oops! Predicting Unintentional Action in Video." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/epstein2020cvpr-oops/) doi:10.1109/CVPR42600.2020.00100

BibTeX

@inproceedings{epstein2020cvpr-oops,
  title     = {{Oops! Predicting Unintentional Action in Video}},
  author    = {Epstein, Dave and Chen, Boyuan and Vondrick, Carl},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020},
  doi       = {10.1109/CVPR42600.2020.00100},
  url       = {https://mlanthology.org/cvpr/2020/epstein2020cvpr-oops/}
}