Bayesian Clustering for Email Campaign Detection

Abstract

We discuss the problem of clustering elements according to the sources that have generated them. For elements that are characterized by independent binary attributes, a closed-form Bayesian solution exists. We derive a solution for the case of dependent attributes that is based on a transformation of the instances into a space of independent feature functions. We derive an optimization problem that produces a mapping into a space of independent binary feature vectors; the features can reflect arbitrary dependencies in the input space. This problem setting is motivated by the application of spam filtering for email service providers. Spam traps deliver a real-time stream of messages known to be spam. If elements of the same campaign can be recognized reliably, entire spam and phishing campaigns can be contained. We present a case study that evaluates Bayesian clustering for this application.

Cite

Text

Haider and Scheffer. "Bayesian Clustering for Email Campaign Detection." International Conference on Machine Learning, 2009. doi:10.1145/1553374.1553424

Markdown

[Haider and Scheffer. "Bayesian Clustering for Email Campaign Detection." International Conference on Machine Learning, 2009.](https://mlanthology.org/icml/2009/haider2009icml-bayesian/) doi:10.1145/1553374.1553424

BibTeX

@inproceedings{haider2009icml-bayesian,
  title     = {{Bayesian Clustering for Email Campaign Detection}},
  author    = {Haider, Peter and Scheffer, Tobias},
  booktitle = {International Conference on Machine Learning},
  year      = {2009},
  pages     = {385-392},
  doi       = {10.1145/1553374.1553424},
  url       = {https://mlanthology.org/icml/2009/haider2009icml-bayesian/}
}