Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions
Abstract
The joint density of a data stream is suitable for performing data mining tasks without having access to the original data. However, the methods proposed so far only target a small to medium number of variables, since their estimates rely on representing all the interdependencies between the variables of the data. High-dimensional data streams, which are becoming more and more frequent due to increasing numbers of interconnected devices, are, therefore, pushing these methods to their limits. To mitigate these limitations, we present an approach that projects the original data stream into a vector space and uses a set of representatives to provide an estimate. Due to the structure of the estimates, it enables the density estimation of higher-dimensional data and approaches the true density with increasing dimensionality of the vector space. Moreover, it is not only designed to estimate homogeneous data, i.e., where all variables are nominal or all variables are numeric, but it can also estimate heterogeneous data. The evaluation is conducted on synthetic and real-world data. The software related to this paper is available at https://github.com/geilke/mideo .
Cite
Text
Geilke et al. "Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016. doi:10.1007/978-3-319-46128-1_5Markdown
[Geilke et al. "Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016.](https://mlanthology.org/ecmlpkdd/2016/geilke2016ecmlpkdd-online/) doi:10.1007/978-3-319-46128-1_5BibTeX
@inproceedings{geilke2016ecmlpkdd-online,
title = {{Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions}},
author = {Geilke, Michael and Karwath, Andreas and Kramer, Stefan},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2016},
pages = {65-80},
doi = {10.1007/978-3-319-46128-1_5},
url = {https://mlanthology.org/ecmlpkdd/2016/geilke2016ecmlpkdd-online/}
}