Learning to Identify Concise Regular Expressions That Describe Email Campaigns

Prasse, Paul; Sawade, Christoph; Landwehr, Niels; Scheffer, Tobias

Learning to Identify Concise Regular Expressions That Describe Email Campaigns

Paul Prasse, Christoph Sawade, Niels Landwehr, Tobias Scheffer

JMLR 2015 pp. 3687-3720

/jmlr/2015/prasse2015jmlr-learning/

Abstract

This paper addresses the problem of inferring a regular expression from a given set of strings that resembles, as closely as possible, the regular expression that a human expert would have written to identify the language. This is motivated by our goal of automating the task of postmasters who use regular expressions to describe and blacklist email spam campaigns. Training data contains batches of messages and corresponding regular expressions that an expert postmaster feels confident to blacklist. We model this task as a two-stage learning problem with structured output spaces and appropriate loss functions. We derive decoders and the resulting optimization problems which can be solved using standard cutting plane methods. We report on a case study conducted with an email service provider.

PDF JMLR Semantic Scholar

Cite

Text

Prasse et al. "Learning to Identify Concise Regular Expressions That Describe Email Campaigns." Journal of Machine Learning Research, 2015.

Markdown

[Prasse et al. "Learning to Identify Concise Regular Expressions That Describe Email Campaigns." Journal of Machine Learning Research, 2015.](https://mlanthology.org/jmlr/2015/prasse2015jmlr-learning/)

BibTeX

@article{prasse2015jmlr-learning,
  title     = {{Learning to Identify Concise Regular Expressions That Describe Email Campaigns}},
  author    = {Prasse, Paul and Sawade, Christoph and Landwehr, Niels and Scheffer, Tobias},
  journal   = {Journal of Machine Learning Research},
  year      = {2015},
  pages     = {3687-3720},
  volume    = {16},
  url       = {https://mlanthology.org/jmlr/2015/prasse2015jmlr-learning/}
}