Improving Hashing Algorithms for Similarity Search via MLE and the Control Variates Trick
Abstract
Hashing algorithms are continually used for large-scale learning and similarity search, with computationally cheap and better algorithms being proposed every year. In this paper we focus on hashing algorithms which involve estimating a distance measure $d(\vec{x}_i,\vec{x}_j)$ between two vectors $\vec{x}_i, \vec{x}_j$. Such hashing algorithms require generation of random variables, and we propose two approaches to reduce the variance of our hashed estimates: control variates and maximum likelihood estimates. We explain how these approaches can be immediately applied to a wide subset of hashing algorithms. Further, we evaluate the impact of these methods on various datasets. We finally run empirical simulations to verify our results.
Cite
Text
Kang et al. "Improving Hashing Algorithms for Similarity Search via MLE and the Control Variates Trick." Proceedings of The 13th Asian Conference on Machine Learning, 2021.Markdown
[Kang et al. "Improving Hashing Algorithms for Similarity Search via MLE and the Control Variates Trick." Proceedings of The 13th Asian Conference on Machine Learning, 2021.](https://mlanthology.org/acml/2021/kang2021acml-improving/)BibTeX
@inproceedings{kang2021acml-improving,
title = {{Improving Hashing Algorithms for Similarity Search via MLE and the Control Variates Trick}},
author = {Kang, Keegan and Kushnarev, Sergey and Wong, Wei Pin and Pratap, Rameshwar and Yeo, Haikal and Yijia, Chen},
booktitle = {Proceedings of The 13th Asian Conference on Machine Learning},
year = {2021},
pages = {814-829},
volume = {157},
url = {https://mlanthology.org/acml/2021/kang2021acml-improving/}
}