Summarising Data by Clustering Items

Mampaey, Michael; Vreeken, Jilles

doi:10.1007/978-3-642-15883-4_21

Summarising Data by Clustering Items

Michael Mampaey, Jilles Vreeken

ECML-PKDD 2010 pp. 321-336

doi:10.1007/978-3-642-15883-4_21 /ecmlpkdd/2010/mampaey2010ecmlpkdd-summarising/

Abstract

For a book, the title and abstract provide a good first impression of what to expect from it. For a database, getting a first impression is not so straightforward. While low-order statistics only provide limited insight, mining the data quickly provides too much detail. In this paper we propose a middle ground, and introduce a parameter-free method for constructing high-quality summaries for binary data. Our method builds a summary by grouping items that strongly correlate, and uses the Minimum Description Length principle to identify the best grouping —without requiring a distance measure between items. Besides offering a practical overview of which attributes interact most strongly, these summaries are also easily-queried surrogates for the data. Experiments show that our method discovers high-quality results: correlated attributes are correctly grouped and the supports of frequent itemsets are closely approximated.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Mampaey and Vreeken. "Summarising Data by Clustering Items." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2010. doi:10.1007/978-3-642-15883-4_21

Markdown

[Mampaey and Vreeken. "Summarising Data by Clustering Items." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2010.](https://mlanthology.org/ecmlpkdd/2010/mampaey2010ecmlpkdd-summarising/) doi:10.1007/978-3-642-15883-4_21

BibTeX

@inproceedings{mampaey2010ecmlpkdd-summarising,
  title     = {{Summarising Data by Clustering Items}},
  author    = {Mampaey, Michael and Vreeken, Jilles},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2010},
  pages     = {321-336},
  doi       = {10.1007/978-3-642-15883-4_21},
  url       = {https://mlanthology.org/ecmlpkdd/2010/mampaey2010ecmlpkdd-summarising/}
}