Sets of Robust Rules, and How to Find Them

Abstract

Association rules are among the most important concepts in data mining. Rules of the form $X \rightarrow Y$ are simple to understand, simple to act upon, yet can model important local dependencies in data. The problem is, however, that there are so many of them. Both traditional and state-of-the-art frameworks typically yield millions of rules, rather than identifying a small set of rules that capture the most important dependencies of the data. In this paper, we define the problem of association rule mining in terms of the Minimum Description Length principle. That is, we identify the best set of rules as the one that most succinctly describes the data. We show that the resulting optimization problem does not lend itself for exact search, and hence propose Grab , a greedy heuristic to efficiently discover good sets of noise-resistant rules directly from data. Through extensive experiments we show that, unlike the state-of-the-art, Grab does reliably recover the ground truth. On real world data we show it finds reasonable numbers of rules, that upon close inspection give clear insight in the local distribution of the data.

Cite

Text

Fischer and Vreeken. "Sets of Robust Rules, and How to Find Them." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2019. doi:10.1007/978-3-030-46150-8_3

Markdown

[Fischer and Vreeken. "Sets of Robust Rules, and How to Find Them." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2019.](https://mlanthology.org/ecmlpkdd/2019/fischer2019ecmlpkdd-sets/) doi:10.1007/978-3-030-46150-8_3

BibTeX

@inproceedings{fischer2019ecmlpkdd-sets,
  title     = {{Sets of Robust Rules, and How to Find Them}},
  author    = {Fischer, Jonas and Vreeken, Jilles},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2019},
  pages     = {38-54},
  doi       = {10.1007/978-3-030-46150-8_3},
  url       = {https://mlanthology.org/ecmlpkdd/2019/fischer2019ecmlpkdd-sets/}
}