Coresets for Data Discretization and Sine Wave Fitting
Abstract
In the monitoring problem, the input is an unbounded stream $P={p_1,p_2\cdots}$ of integers in $[N]:=\{1,\cdots,N\}$, that are obtained from a sensor (such as GPS or heart beats of a human). The goal (e.g., for anomaly detection) is to approximate the $n$ points received so far in $P$ by a single frequency $\sin$, e.g. $\min_{c\in C}cost(P,c)+\lambda(c)$, where $cost(P,c)=\sum_{i=1}^n \sin^2(\frac{2\pi}{N} p_ic)$, $C\subseteq [N]$ is a feasible set of solutions, and $\lambda$ is a given regularization function. For any approximation error $\varepsilon>0$, we prove that every set $P$ of $n$ integers has a weighted subset $S\subseteq P$ (sometimes called core-set) of cardinality $|S|\in O(\log(N)^{O(1)})$ that approximates $cost(P,c)$ (for every $c\in [N]$) up to a multiplicative factor of $1\pm\varepsilon$. Using known coreset techniques, this implies streaming algorithms using only $O((\log(N)\log(n))^{O(1)})$ memory. Our results hold for a large family of functions. Experimental results and open source code are provided.
Cite
Text
Maalouf et al. "Coresets for Data Discretization and Sine Wave Fitting." Artificial Intelligence and Statistics, 2022.Markdown
[Maalouf et al. "Coresets for Data Discretization and Sine Wave Fitting." Artificial Intelligence and Statistics, 2022.](https://mlanthology.org/aistats/2022/maalouf2022aistats-coresets/)BibTeX
@inproceedings{maalouf2022aistats-coresets,
title = {{Coresets for Data Discretization and Sine Wave Fitting}},
author = {Maalouf, Alaa and Tukan, Murad and Price, Eric and Kane, Daniel M. and Feldman, Dan},
booktitle = {Artificial Intelligence and Statistics},
year = {2022},
pages = {10622-10639},
volume = {151},
url = {https://mlanthology.org/aistats/2022/maalouf2022aistats-coresets/}
}