Detecting Large-Scale System Problems by Mining Console Logs
Abstract
Surprisingly, console logs rarely help operators detect problems in large-scale datacenter services, for they often consist of the voluminous intermixing of messages from many software components written by independent developers. We propose a general methodology to mine this rich source of information to automatically detect system runtime problems. We use a combination of program analysis and information retrieval techniques to transform free-text console logs into numerical features, which captures sequences of events in the system. We then analyze these features using machine learning to detect operational problems. We also show how to distill the results of our analysis to an operator-friendly one-page decision tree showing the critical messages associated with the detected problems. In addition, we extend our methods to online problem detection where the sequences of events are continuously generated as data streams.
Cite
Text
Xu et al. "Detecting Large-Scale System Problems by Mining Console Logs." International Conference on Machine Learning, 2010. doi:10.1145/1629575.1629587Markdown
[Xu et al. "Detecting Large-Scale System Problems by Mining Console Logs." International Conference on Machine Learning, 2010.](https://mlanthology.org/icml/2010/xu2010icml-detecting/) doi:10.1145/1629575.1629587BibTeX
@inproceedings{xu2010icml-detecting,
title = {{Detecting Large-Scale System Problems by Mining Console Logs}},
author = {Xu, Wei and Huang, Ling and Fox, Armando and Patterson, David A. and Jordan, Michael I.},
booktitle = {International Conference on Machine Learning},
year = {2010},
pages = {37-46},
doi = {10.1145/1629575.1629587},
url = {https://mlanthology.org/icml/2010/xu2010icml-detecting/}
}