Large-Scale Submodular Greedy Exemplar Selection with Structured Similarity Matrices

Abstract

Exemplar clustering attempts to find a subset of data-points that summarizes the entire data-set in thesense of minimizing the sum of distances from each point to its closest exemplar. It has many importantapplications in machine learning including document and video summarization, data compression, scalability of kernel methods and Gaussian processes, active learning and feature selection. A key challenge in the adoption of exemplar clustering to large-scale applicationshas been the availability of accurate and scalable algorithms. We propose an approach that combines structured similarity matrix representations with submodular greedy maximization that can dramatically increase the scalability of exemplar clustering and still enjoy good approximation guarantees. Exploiting structured similarity matrices within the context of submodular greedy algorithms is by no means trivial, as naive approaches still require computing all the entries of the matrix. We propose a randomized approach based on sampling sign-patterns of columns of the similarity matrix and establish accuracy guarantees. We demonstratesignificant computational speed-ups while still achieving highly accurate solutions, and solve a problem with millions of data-points in about a minute on a single commodity computer.

Cite

Text

Malioutov et al. "Large-Scale Submodular Greedy Exemplar Selection with Structured Similarity Matrices." Conference on Uncertainty in Artificial Intelligence, 2016.

Markdown

[Malioutov et al. "Large-Scale Submodular Greedy Exemplar Selection with Structured Similarity Matrices." Conference on Uncertainty in Artificial Intelligence, 2016.](https://mlanthology.org/uai/2016/malioutov2016uai-large/)

BibTeX

@inproceedings{malioutov2016uai-large,
  title     = {{Large-Scale Submodular Greedy Exemplar Selection with Structured Similarity Matrices}},
  author    = {Malioutov, Dmitry and Kumar, Abhishek and Yen, Ian En-Hsu},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2016},
  url       = {https://mlanthology.org/uai/2016/malioutov2016uai-large/}
}