Bayesian Sets

Abstract

Inspired by “Google™ Sets”, we consider the problem of retrieving items from a concept or cluster, given a query consisting of a few items from that cluster. We formulate this as a Bayesian inference problem and de- scribe a very simple algorithm for solving it. Our algorithm uses a model- based concept of a cluster and ranks items using a score which evaluates the marginal probability that each item belongs to a cluster containing the query items. For exponential family models with conjugate priors this marginal probability is a simple function of sufficient statistics. We focus on sparse binary data and show that our score can be evaluated ex- actly using a single sparse matrix multiplication, making it possible to apply our algorithm to very large datasets. We evaluate our algorithm on three datasets: retrieving movies from EachMovie, finding completions of author sets from the NIPS dataset, and finding completions of sets of words appearing in the Grolier encyclopedia. We compare to Google™ Sets and show that Bayesian Sets gives very reasonable set completions.

Cite

Text

Ghahramani and Heller. "Bayesian Sets." Neural Information Processing Systems, 2005.

Markdown

[Ghahramani and Heller. "Bayesian Sets." Neural Information Processing Systems, 2005.](https://mlanthology.org/neurips/2005/ghahramani2005neurips-bayesian/)

BibTeX

@inproceedings{ghahramani2005neurips-bayesian,
  title     = {{Bayesian Sets}},
  author    = {Ghahramani, Zoubin and Heller, Katherine A.},
  booktitle = {Neural Information Processing Systems},
  year      = {2005},
  pages     = {435-442},
  url       = {https://mlanthology.org/neurips/2005/ghahramani2005neurips-bayesian/}
}