Unsupervised Alignment of Actions in Video with Text Descriptions

Song, Young Chol; Naim, Iftekhar; Al Mamun, Abdullah; Kulkarni, Kaustubh; Singla, Parag; Luo, Jiebo; Gildea, Daniel; Kautz, Henry A.

Unsupervised Alignment of Actions in Video with Text Descriptions

Young Chol Song, Iftekhar Naim, Abdullah Al Mamun, Kaustubh Kulkarni, Parag Singla, Jiebo Luo, Daniel Gildea, Henry A. Kautz

IJCAI 2016 pp. 2025-2031

/ijcai/2016/song2016ijcai-unsupervised/

Abstract

Advances in video technology and data storage have made large scale video data collections of complex activities readily accessible. An increasingly popular approach for automatically inferring the details of a video is to associate the spatio-temporal segments in a video with its natural language descriptions. Most algorithms for connecting natural language with video rely on pre-aligned supervised training data. Recently, several models have been shown to be effective for unsupervised alignment of objects in video with language. However, it remains difficult to generate good spatio-temporal video segments for actions that align well with language. This paper presents a framework that extracts higher level representations of low-level action features through hyperfeature coding from video and aligns them with language. We propose a two-step process that creates a high-level action feature codebook with temporally consistent motions, and then applies an unsupervised alignment algorithm over the action codewords and verbs in the language to identify individual activities. We show an improvement over previous alignment models of objects and nouns on videos of biological experiments, and also evaluate our system on a larger scale collection of videos involving kitchen activities. PDF

PDF IJCAI Semantic Scholar

Cite

Text

Song et al. "Unsupervised Alignment of Actions in Video with Text Descriptions." International Joint Conference on Artificial Intelligence, 2016.

Markdown

[Song et al. "Unsupervised Alignment of Actions in Video with Text Descriptions." International Joint Conference on Artificial Intelligence, 2016.](https://mlanthology.org/ijcai/2016/song2016ijcai-unsupervised/)

BibTeX

@inproceedings{song2016ijcai-unsupervised,
  title     = {{Unsupervised Alignment of Actions in Video with Text Descriptions}},
  author    = {Song, Young Chol and Naim, Iftekhar and Al Mamun, Abdullah and Kulkarni, Kaustubh and Singla, Parag and Luo, Jiebo and Gildea, Daniel and Kautz, Henry A.},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2016},
  pages     = {2025-2031},
  url       = {https://mlanthology.org/ijcai/2016/song2016ijcai-unsupervised/}
}