Semantic Aware Video Transcription Using Random Forest Classifiers

Sun, Chen; Nevatia, Ram

doi:10.1007/978-3-319-10590-1_50

Semantic Aware Video Transcription Using Random Forest Classifiers

Chen Sun, Ram Nevatia

ECCV 2014 pp. 772-786

doi:10.1007/978-3-319-10590-1_50 /eccv/2014/sun2014eccv-semantic/

Abstract

This paper focuses on transcription generation in the form of subject, verb, object (SVO) triplets for videos in the wild, given off-the-shelf visual concept detectors. This problem is challenging due to the availability of sentence only annotations, the unreliability of concept detectors, and the lack of training samples for many words. Facing these challenges, we propose a Semantic Aware Transcription (SAT) framework based on Random Forest classifiers. It takes concept detection results as input, and outputs a distribution of English words. SAT uses video, sentence pairs for training. It hierarchically learns node splits by grouping semantically similar words, measured by a continuous skip-gram language model. This not only addresses the sparsity of training samples per word, but also yields semantically reasonable errors during transcription. SAT provides a systematic way to measure the relatedness of a concept detector to real words, which helps us understand the relationship between current visual detectors and words in a semantic space. Experiments on a large video dataset with 1,970 clips and 85,550 sentences are used to demonstrate our idea.

PDF ECCV Semantic Scholar

Cite

Text

Sun and Nevatia. "Semantic Aware Video Transcription Using Random Forest Classifiers." European Conference on Computer Vision, 2014. doi:10.1007/978-3-319-10590-1_50

Markdown

[Sun and Nevatia. "Semantic Aware Video Transcription Using Random Forest Classifiers." European Conference on Computer Vision, 2014.](https://mlanthology.org/eccv/2014/sun2014eccv-semantic/) doi:10.1007/978-3-319-10590-1_50

BibTeX

@inproceedings{sun2014eccv-semantic,
  title     = {{Semantic Aware Video Transcription Using Random Forest Classifiers}},
  author    = {Sun, Chen and Nevatia, Ram},
  booktitle = {European Conference on Computer Vision},
  year      = {2014},
  pages     = {772-786},
  doi       = {10.1007/978-3-319-10590-1_50},
  url       = {https://mlanthology.org/eccv/2014/sun2014eccv-semantic/}
}