Probabilistic Semantic Video Indexing

Abstract

We propose a novel probabilistic framework for semantic video in(cid:173) dexing. We define probabilistic multimedia objects (multijects) to map low-level media features to high-level semantic labels. A graphical network of such multijects (multinet) captures scene con(cid:173) text by discovering intra-frame as well as inter-frame dependency relations between the concepts. The main contribution is a novel application of a factor graph framework to model this network. We model relations between semantic concepts in terms of their co-occurrence as well as the temporal dependencies between these concepts within video shots. Using the sum-product algorithm [1] for approximate or exact inference in these factor graph multinets, we attempt to correct errors made during isolated concept detec(cid:173) tion by forcing high-level constraints. This results in a significant improvement in the overall detection performance.

Cite

Text

Naphade et al. "Probabilistic Semantic Video Indexing." Neural Information Processing Systems, 2000.

Markdown

[Naphade et al. "Probabilistic Semantic Video Indexing." Neural Information Processing Systems, 2000.](https://mlanthology.org/neurips/2000/naphade2000neurips-probabilistic/)

BibTeX

@inproceedings{naphade2000neurips-probabilistic,
  title     = {{Probabilistic Semantic Video Indexing}},
  author    = {Naphade, Milind R. and Kozintsev, Igor and Huang, Thomas S.},
  booktitle = {Neural Information Processing Systems},
  year      = {2000},
  pages     = {967-973},
  url       = {https://mlanthology.org/neurips/2000/naphade2000neurips-probabilistic/}
}