Near Duplicate Image Discovery on One Billion Images

Kim, Saehoon; Wang, Xin-Jing; Zhang, Lei; Choi, Seungjin

doi:10.1109/WACV.2015.130

Near Duplicate Image Discovery on One Billion Images

Saehoon Kim, Xin-Jing Wang, Lei Zhang, Seungjin Choi

WACV 2015 pp. 943-950

doi:10.1109/WACV.2015.130 /wacv/2015/kim2015wacv-near/

Abstract

Near-duplicate image discovery is the task of detecting all clusters of images which duplicate at a significant region. Previous work generally take divide and conquer approaches composed of two steps: generating cluster seeds using min-hashing, and growing the seeds by searching the entire image space with the seeds as queries. Since the computational complexity of the seed growing step is generally O (NL) where N and L are the number of images and seeds respectively, existing work can hardly be scaled up to a billion-scale dataset because L is typically millions. In this paper, we study a feasible solution of near-duplicate image discovery on one billion images, which is easily implemented on MapReduce framework. The major contribution of this work is to introduce the seed growing step designed to efficiently reduce the number of false positives among cluster seeds with O (cNL) time complexity, where c is small enough for a billion-scale dataset. The basis component of the seed growing step is a bottom-k min-hash, which generates different signatures in a sketch to remove all candidate images that share only one common visual word with a cluster seed. Our evaluations suggest that the proposed method can discover near-duplicate clusters with high precision and recall, and represent some interesting properties of our 1 billion dataset.

WACV Semantic Scholar

Cite

Text

Kim et al. "Near Duplicate Image Discovery on One Billion Images." IEEE/CVF Winter Conference on Applications of Computer Vision, 2015. doi:10.1109/WACV.2015.130

Markdown

[Kim et al. "Near Duplicate Image Discovery on One Billion Images." IEEE/CVF Winter Conference on Applications of Computer Vision, 2015.](https://mlanthology.org/wacv/2015/kim2015wacv-near/) doi:10.1109/WACV.2015.130

BibTeX

@inproceedings{kim2015wacv-near,
  title     = {{Near Duplicate Image Discovery on One Billion Images}},
  author    = {Kim, Saehoon and Wang, Xin-Jing and Zhang, Lei and Choi, Seungjin},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2015},
  pages     = {943-950},
  doi       = {10.1109/WACV.2015.130},
  url       = {https://mlanthology.org/wacv/2015/kim2015wacv-near/}
}