Information Bottleneck for Non Co-Occurrence Data

Abstract

We present a general model-independent approach to the analysis of data in cases when these data do not appear in the form of co-occurrence of two variables X, Y , but rather as a sample of values of an unknown (stochastic) function Z (X, Y ). For example, in gene expression data, the expression level Z is a function of gene X and condition Y ; or in movie ratings data the rating Z is a function of viewer X and movie Y . The approach represents a consistent extension of the Information Bottleneck method that has previously relied on the availability of co-occurrence statistics. By altering the relevance variable we eliminate the need in the sample of joint distribution of all input variables. This new formulation also enables simple MDL-like model complexity control and prediction of missing values of Z . The approach is analyzed and shown to be on a par with the best known clustering algorithms for a wide range of domains. For the prediction of missing values (collaborative filtering) it improves the currently best known results.

Cite

Text

Seldin et al. "Information Bottleneck for Non Co-Occurrence Data." Neural Information Processing Systems, 2006.

Markdown

[Seldin et al. "Information Bottleneck for Non Co-Occurrence Data." Neural Information Processing Systems, 2006.](https://mlanthology.org/neurips/2006/seldin2006neurips-information/)

BibTeX

@inproceedings{seldin2006neurips-information,
  title     = {{Information Bottleneck for Non Co-Occurrence Data}},
  author    = {Seldin, Yevgeny and Slonim, Noam and Tishby, Naftali},
  booktitle = {Neural Information Processing Systems},
  year      = {2006},
  pages     = {1241-1248},
  url       = {https://mlanthology.org/neurips/2006/seldin2006neurips-information/}
}