Learning Distance Function by Coding Similarity

Abstract

We consider the problem of learning a similarity function from a set of positive equivalence constraints, i.e. "similar" point pairs. We define the similarity in information theoretic terms, as the gain in coding length when shifting from independent encoding of the pair to joint encoding. Under simple Gaussian assumptions, this formulation leads to a non-Mahalanobis similarity function which is effcient and simple to learn. This function can be viewed as a likelihood ratio test, and we show that the optimal similaritypreserving pro jection of the data is a variant of Fisher Linear Discriminant. We also show that under some naturally occurring sampling conditions of equivalence constraints, this function converges to a known Mahalanobis distance (RCA). The suggested similarity function exhibits superior performance over alternative Mahalanobis distances learnt from the same data. Its superiority is demonstrated in the context of image retrieval and graph based clustering, using a large number of data sets.

Cite

Text

Bar-Hillel and Weinshall. "Learning Distance Function by Coding Similarity." International Conference on Machine Learning, 2007. doi:10.1145/1273496.1273505

Markdown

[Bar-Hillel and Weinshall. "Learning Distance Function by Coding Similarity." International Conference on Machine Learning, 2007.](https://mlanthology.org/icml/2007/barhillel2007icml-learning/) doi:10.1145/1273496.1273505

BibTeX

@inproceedings{barhillel2007icml-learning,
  title     = {{Learning Distance Function by Coding Similarity}},
  author    = {Bar-Hillel, Aharon and Weinshall, Daphna},
  booktitle = {International Conference on Machine Learning},
  year      = {2007},
  pages     = {65-72},
  doi       = {10.1145/1273496.1273505},
  url       = {https://mlanthology.org/icml/2007/barhillel2007icml-learning/}
}