S2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions

Abstract

To compare the similarity of probability distributions, the information-theoretically motivated metrics like Kullback-Leibler divergence (KL) and Jensen-Shannon divergence (JSD) are often more reasonable compared with metrics for vectors like Euclidean and angular distance. However, existing locality-sensitive hashing (LSH) algorithms cannot support the information-theoretically motivated metrics for probability distributions. In this paper, we first introduce a new approximation formula for S2JSD-distance, and then propose a novel LSH scheme adapted to S2JSD-distance for approximate nearest neighbors search in high-dimensional probability distributions. We define the specific hashing functions, and prove their local-sensitivity. Furthermore, extensive empirical evaluations well illustrate the effectiveness of the proposed hashing schema on six public image datasets and two text datasets, in terms of mean Average Precision, Precision@N and Precision-Recall curve.

Cite

Text

Mao et al. "S2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.10989

Markdown

[Mao et al. "S2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/mao2017aaai-s/) doi:10.1609/AAAI.V31I1.10989

BibTeX

@inproceedings{mao2017aaai-s,
  title     = {{S2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions}},
  author    = {Mao, Xianling and Feng, Bo-Si and Hao, Yi-Jing and Nie, Liqiang and Huang, Heyan and Wen, Guihua},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {3244-3251},
  doi       = {10.1609/AAAI.V31I1.10989},
  url       = {https://mlanthology.org/aaai/2017/mao2017aaai-s/}
}