Practical Data-Dependent Metric Compression with Provable Guarantees

Abstract

We introduce a new distance-preserving compact representation of multi-dimensional point-sets. Given n points in a d-dimensional space where each coordinate is represented using B bits (i.e., dB bits per point), it produces a representation of size O( d log(d B/epsilon) +log n) bits per point from which one can approximate the distances up to a factor of 1 + epsilon. Our algorithm almost matches the recent bound of Indyk et al, 2017} while being much simpler. We compare our algorithm to Product Quantization (PQ) (Jegou et al, 2011) a state of the art heuristic metric compression method. We evaluate both algorithms on several data sets: SIFT, MNIST, New York City taxi time series and a synthetic one-dimensional data set embedded in a high-dimensional space. Our algorithm produces representations that are comparable to or better than those produced by PQ, while having provable guarantees on its performance.

Cite

Text

Indyk et al. "Practical Data-Dependent Metric Compression with Provable Guarantees." Neural Information Processing Systems, 2017.

Markdown

[Indyk et al. "Practical Data-Dependent Metric Compression with Provable Guarantees." Neural Information Processing Systems, 2017.](https://mlanthology.org/neurips/2017/indyk2017neurips-practical/)

BibTeX

@inproceedings{indyk2017neurips-practical,
  title     = {{Practical Data-Dependent Metric Compression with Provable Guarantees}},
  author    = {Indyk, Piotr and Razenshteyn, Ilya and Wagner, Tal},
  booktitle = {Neural Information Processing Systems},
  year      = {2017},
  pages     = {2617-2626},
  url       = {https://mlanthology.org/neurips/2017/indyk2017neurips-practical/}
}