Fast Computation of Min-Hash Signatures for Image Collections

Abstract

A new method for highly efficient min-Hash generation for document collections is proposed. It exploits the inverted file structure which is available in many applications based on a bag or a set of words. Fast min-Hash generation is important in applications such as image clustering where good recall and precision requires a large number of min-Hash signatures. Using the set of words represenation, the novel exact min-Hash generation algorithm achieves approximately a 50-fold speed-up on two dataset with 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">5</sup> and 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">6</sup> images respectively. We also propose an approximate min-Hash assignment process which reaches a more than 200-fold speed-up at the cost of missing about 2–3% of matches. We also experimentally show that the method generalizes to other modalities with significantly different statistics.

Cite

Text

Chum and Matas. "Fast Computation of Min-Hash Signatures for Image Collections." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2012. doi:10.1109/CVPR.2012.6248039

Markdown

[Chum and Matas. "Fast Computation of Min-Hash Signatures for Image Collections." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2012.](https://mlanthology.org/cvpr/2012/chum2012cvpr-fast/) doi:10.1109/CVPR.2012.6248039

BibTeX

@inproceedings{chum2012cvpr-fast,
  title     = {{Fast Computation of Min-Hash Signatures for Image Collections}},
  author    = {Chum, Ondrej and Matas, Jiri},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2012},
  pages     = {3077-3084},
  doi       = {10.1109/CVPR.2012.6248039},
  url       = {https://mlanthology.org/cvpr/2012/chum2012cvpr-fast/}
}