Unified Embedding and Metric Learning for Zero-Exemplar Event Detection
Abstract
Event detection in unconstrained videos is conceived as a content-based video retrieval with two modalities: textual and visual. Given a text describing a novel event, the goal is to rank related videos accordingly. This task is zero-exemplar, no video examples are given to the novel event. Related works train a bank of concept detectors on external data sources. These detectors predict confidence scores for test videos, which are ranked and retrieved accordingly. In contrast, we learn a joint space in which the visual and textual representations are embedded. The space casts a novel event as a probability of pre-defined events. Also, it learns to measure the distance between an event and its related videos. Our model is trained end-to-end on publicly available EventNet. When applied to TRECVID Multimedia Event Detection dataset, it outperforms the state-of-the-art by a considerable margin.
Cite
Text
Hussein et al. "Unified Embedding and Metric Learning for Zero-Exemplar Event Detection." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.225Markdown
[Hussein et al. "Unified Embedding and Metric Learning for Zero-Exemplar Event Detection." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/hussein2017cvpr-unified/) doi:10.1109/CVPR.2017.225BibTeX
@inproceedings{hussein2017cvpr-unified,
title = {{Unified Embedding and Metric Learning for Zero-Exemplar Event Detection}},
author = {Hussein, Noureldien and Gavves, Efstratios and Smeulders, Arnold W.M.},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2017},
doi = {10.1109/CVPR.2017.225},
url = {https://mlanthology.org/cvpr/2017/hussein2017cvpr-unified/}
}