Going Metric: Denoising Pairwise Data

Abstract

Pairwise data in empirical sciences typically violate metricity, ei(cid:173) ther due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical ma(cid:173) chine learning and signal processing algorithms. The class of pair(cid:173) wise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a sec(cid:173) ond step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised met(cid:173) ric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods.

Cite

Text

Roth et al. "Going Metric: Denoising Pairwise Data." Neural Information Processing Systems, 2002.

Markdown

[Roth et al. "Going Metric: Denoising Pairwise Data." Neural Information Processing Systems, 2002.](https://mlanthology.org/neurips/2002/roth2002neurips-going/)

BibTeX

@inproceedings{roth2002neurips-going,
  title     = {{Going Metric: Denoising Pairwise Data}},
  author    = {Roth, Volker and Laub, Julian and Müller, Klaus-Robert and Buhmann, Joachim M.},
  booktitle = {Neural Information Processing Systems},
  year      = {2002},
  pages     = {841-848},
  url       = {https://mlanthology.org/neurips/2002/roth2002neurips-going/}
}