Unsupervised Signature Extraction from Forensic Logs

Abstract

Signature extraction is a key part of forensic log analysis. It involves recognizing patterns in log lines such that log lines that originated from the same line of code are grouped together. A log signature consists of immutable parts and mutable parts. The immutable parts define the signature, and the mutable parts are typically variable parameter values. In practice, the number of log lines and signatures can be quite large, and the task of detecting and aligning immutable parts of the logs to extract the signatures becomes a significant challenge. We propose a novel method based on a neural language model that outperforms the current state-of-the-art on signature extraction. We use an RNN auto-encoder to create an embedding of the log lines. Log lines embedded in such a way can be clustered to extract the signatures in an unsupervised manner.

Cite

Text

Thaler et al. "Unsupervised Signature Extraction from Forensic Logs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017. doi:10.1007/978-3-319-71273-4_25

Markdown

[Thaler et al. "Unsupervised Signature Extraction from Forensic Logs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017.](https://mlanthology.org/ecmlpkdd/2017/thaler2017ecmlpkdd-unsupervised/) doi:10.1007/978-3-319-71273-4_25

BibTeX

@inproceedings{thaler2017ecmlpkdd-unsupervised,
  title     = {{Unsupervised Signature Extraction from Forensic Logs}},
  author    = {Thaler, Stefan and Menkovski, Vlado and Petkovic, Milan},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2017},
  pages     = {305-316},
  doi       = {10.1007/978-3-319-71273-4_25},
  url       = {https://mlanthology.org/ecmlpkdd/2017/thaler2017ecmlpkdd-unsupervised/}
}