Zero-Shot Anticipation for Instructional Activities

Sener, Fadime; Yao, Angela

doi:10.1109/ICCV.2019.00095

Zero-Shot Anticipation for Instructional Activities

Fadime Sener, Angela Yao

ICCV 2019

doi:10.1109/ICCV.2019.00095 /iccv/2019/sener2019iccv-zeroshot/

Abstract

How can we teach a robot to predict what will happen next for an activity it has never seen before? We address the problem of zero-shot anticipation by presenting a hierarchical model that generalizes instructional knowledge from large-scale text-corpora and transfers the knowledge to the visual domain. Given a portion of an instructional video, our model predicts coherent and plausible actions multiple steps into the future, all in rich natural language. To demonstrate the anticipation capabilities of our model, we introduce the Tasty Videos dataset, a collection of 2511 recipes for zero-shot learning, recognition and anticipation.

PDF ICCV Semantic Scholar

Cite

Text

Sener and Yao. "Zero-Shot Anticipation for Instructional Activities." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00095

Markdown

[Sener and Yao. "Zero-Shot Anticipation for Instructional Activities." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/sener2019iccv-zeroshot/) doi:10.1109/ICCV.2019.00095

BibTeX

@inproceedings{sener2019iccv-zeroshot,
  title     = {{Zero-Shot Anticipation for Instructional Activities}},
  author    = {Sener, Fadime and Yao, Angela},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2019},
  doi       = {10.1109/ICCV.2019.00095},
  url       = {https://mlanthology.org/iccv/2019/sener2019iccv-zeroshot/}
}