Coresets for Data Discretization and Sine Wave Fitting

Abstract

In the monitoring problem, the input is an unbounded stream $P={p_1,p_2\cdots}$ of integers in $[N]:=\{1,\cdots,N\}$, that are obtained from a sensor (such as GPS or heart beats of a human). The goal (e.g., for anomaly detection) is to approximate the $n$ points received so far in $P$ by a single frequency $\sin$, e.g. $\min_{c\in C}cost(P,c)+\lambda(c)$, where $cost(P,c)=\sum_{i=1}^n \sin^2(\frac{2\pi}{N} p_ic)$, $C\subseteq [N]$ is a feasible set of solutions, and $\lambda$ is a given regularization function. For any approximation error $\varepsilon>0$, we prove that every set $P$ of $n$ integers has a weighted subset $S\subseteq P$ (sometimes called core-set) of cardinality $|S|\in O(\log(N)^{O(1)})$ that approximates $cost(P,c)$ (for every $c\in [N]$) up to a multiplicative factor of $1\pm\varepsilon$. Using known coreset techniques, this implies streaming algorithms using only $O((\log(N)\log(n))^{O(1)})$ memory. Our results hold for a large family of functions. Experimental results and open source code are provided.

Cite

Text

Maalouf et al. "Coresets for Data Discretization and Sine Wave Fitting." Artificial Intelligence and Statistics, 2022.

Markdown

[Maalouf et al. "Coresets for Data Discretization and Sine Wave Fitting." Artificial Intelligence and Statistics, 2022.](https://mlanthology.org/aistats/2022/maalouf2022aistats-coresets/)

BibTeX

@inproceedings{maalouf2022aistats-coresets,
  title     = {{Coresets for Data Discretization and Sine Wave Fitting}},
  author    = {Maalouf, Alaa and Tukan, Murad and Price, Eric and Kane, Daniel M. and Feldman, Dan},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2022},
  pages     = {10622-10639},
  volume    = {151},
  url       = {https://mlanthology.org/aistats/2022/maalouf2022aistats-coresets/}
}