Compact Video Description for Copy Detection with Precise Temporal Alignment
Abstract
This paper introduces a very compact yet discriminative video description, which allows example-based search in a large number of frames corresponding to thousands of hours of video. Our description extracts one descriptor per indexed video frame by aggregating a set of local descriptors. These frame descriptors are encoded using a time-aware hierarchical indexing structure. A modified temporal Hough voting scheme is used to rank the retrieved database videos and estimate segments in them that match the query. If we use a dense temporal description of the videos, matched video segments are localized with excellent precision. Experimental results on the Trecvid 2008 copy detection task and a set of 38000 videos from YouTube show that our method offers an excellent trade-off between search accuracy, efficiency and memory usage.
Cite
Text
Douze et al. "Compact Video Description for Copy Detection with Precise Temporal Alignment." European Conference on Computer Vision, 2010. doi:10.1007/978-3-642-15549-9_38Markdown
[Douze et al. "Compact Video Description for Copy Detection with Precise Temporal Alignment." European Conference on Computer Vision, 2010.](https://mlanthology.org/eccv/2010/douze2010eccv-compact/) doi:10.1007/978-3-642-15549-9_38BibTeX
@inproceedings{douze2010eccv-compact,
title = {{Compact Video Description for Copy Detection with Precise Temporal Alignment}},
author = {Douze, Matthijs and Jégou, Hervé and Schmid, Cordelia and Pérez, Patrick},
booktitle = {European Conference on Computer Vision},
year = {2010},
pages = {522-535},
doi = {10.1007/978-3-642-15549-9_38},
url = {https://mlanthology.org/eccv/2010/douze2010eccv-compact/}
}