Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them

Abstract

Data stream classification plays an important role in modern data analysis, where data arrives in a stream and needs to be mined in real time. In the data stream setting the underlying distribution from which this data comes may be changing and evolving, and so classifiers that can update themselves during operation are becoming the state-of-the-art. In this paper we show that data streams may have an important temporal component, which currently is not considered in the evaluation and benchmarking of data stream classifiers. We demonstrate how a naive classifier considering the temporal component only outperforms a lot of current state-of-the-art classifiers on real data streams that have temporal dependence, i.e. data is autocorrelated. We propose to evaluate data stream classifiers taking into account temporal dependence, and introduce a new evaluation measure, which provides a more accurate gauge of data stream classifier performance. In response to the temporal dependence issue we propose a generic wrapper for data stream classifiers, which incorporates the temporal component into the attribute space.

Cite

Text

Bifet et al. "Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2013. doi:10.1007/978-3-642-40988-2_30

Markdown

[Bifet et al. "Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2013.](https://mlanthology.org/ecmlpkdd/2013/bifet2013ecmlpkdd-pitfalls/) doi:10.1007/978-3-642-40988-2_30

BibTeX

@inproceedings{bifet2013ecmlpkdd-pitfalls,
  title     = {{Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them}},
  author    = {Bifet, Albert and Read, Jesse and Zliobaite, Indre and Pfahringer, Bernhard and Holmes, Geoff},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2013},
  pages     = {465-479},
  doi       = {10.1007/978-3-642-40988-2_30},
  url       = {https://mlanthology.org/ecmlpkdd/2013/bifet2013ecmlpkdd-pitfalls/}
}