Fast Computation of Min-Hash Signatures for Image Collections
Abstract
A new method for highly efficient min-Hash generation for document collections is proposed. It exploits the inverted file structure which is available in many applications based on a bag or a set of words. Fast min-Hash generation is important in applications such as image clustering where good recall and precision requires a large number of min-Hash signatures. Using the set of words represenation, the novel exact min-Hash generation algorithm achieves approximately a 50-fold speed-up on two dataset with 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">5</sup> and 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">6</sup> images respectively. We also propose an approximate min-Hash assignment process which reaches a more than 200-fold speed-up at the cost of missing about 2–3% of matches. We also experimentally show that the method generalizes to other modalities with significantly different statistics.
Cite
Text
Chum and Matas. "Fast Computation of Min-Hash Signatures for Image Collections." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2012. doi:10.1109/CVPR.2012.6248039Markdown
[Chum and Matas. "Fast Computation of Min-Hash Signatures for Image Collections." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2012.](https://mlanthology.org/cvpr/2012/chum2012cvpr-fast/) doi:10.1109/CVPR.2012.6248039BibTeX
@inproceedings{chum2012cvpr-fast,
title = {{Fast Computation of Min-Hash Signatures for Image Collections}},
author = {Chum, Ondrej and Matas, Jiri},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2012},
pages = {3077-3084},
doi = {10.1109/CVPR.2012.6248039},
url = {https://mlanthology.org/cvpr/2012/chum2012cvpr-fast/}
}