Video in Sentences Out

Abstract

We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases,spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adjuncts and adverbial modifiers. Extracting the information needed to render these linguistic entities requires an approach to event recognition that recovers object tracks, the track-to-role assignments, and changing body posture.

Cite

Text

Barbu et al. "Video in Sentences Out." Conference on Uncertainty in Artificial Intelligence, 2012.

Markdown

[Barbu et al. "Video in Sentences Out." Conference on Uncertainty in Artificial Intelligence, 2012.](https://mlanthology.org/uai/2012/barbu2012uai-video/)

BibTeX

@inproceedings{barbu2012uai-video,
  title     = {{Video in Sentences Out}},
  author    = {Barbu, Andrei and Bridge, Alexander and Burchill, Zachary and Coroian, Dan and Dickinson, Sven J. and Fidler, Sanja and Michaux, Aaron and Mussman, Sam and Narayanaswamy, Siddharth and Salvi, Dhaval and Schmidt, Lara and Shangguan, Jiangnan and Siskind, Jeffrey Mark and Waggoner, Jarrell W. and Wang, Song and Wei, Jinlian and Yin, Yifan and Zhang, Zhiqi},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2012},
  pages     = {102-112},
  url       = {https://mlanthology.org/uai/2012/barbu2012uai-video/}
}