Dimensionality Reduction for the Sum-of-Distances Metric

Abstract

We give a dimensionality reduction procedure to approximate the sum of distances of a given set of $n$ points in $R^d$ to any “shape” that lies in a $k$-dimensional subspace. Here, by “shape” we mean any set of points in $R^d$. Our algorithm takes an input in the form of an $n \times d$ matrix $A$, where each row of $A$ denotes a data point, and outputs a subspace $P$ of dimension $O(k^{3}/\epsilon^6)$ such that the projections of each of the $n$ points onto the subspace $P$ and the distances of each of the points to the subspace $P$ are sufficient to obtain an $\epsilon$-approximation to the sum of distances to any arbitrary shape that lies in a $k$-dimensional subspace of $R^d$. These include important problems such as $k$-median, $k$-subspace approximation, and $(j,l)$ subspace clustering with $j \cdot l \leq k$. Dimensionality reduction reduces the data storage requirement to $(n+d)k^{3}/\epsilon^6$ from nnz$(A)$. Here nnz$(A)$ could potentially be as large as $nd$. Our algorithm runs in time nnz$(A)/\epsilon^2 + (n+d)$poly$(k/\epsilon)$, up to logarithmic factors. For dense matrices, where nnz$(A) \approx nd$, we give a faster algorithm, that runs in time $nd + (n+d)$poly$(k/\epsilon)$ up to logarithmic factors. Our dimensionality reduction algorithm can also be used to obtain poly$(k/\epsilon)$ size coresets for $k$-median and $(k,1)$-subspace approximation problems in polynomial time.

Cite

Text

Feng et al. "Dimensionality Reduction for the Sum-of-Distances Metric." International Conference on Machine Learning, 2021.

Markdown

[Feng et al. "Dimensionality Reduction for the Sum-of-Distances Metric." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/feng2021icml-dimensionality/)

BibTeX

@inproceedings{feng2021icml-dimensionality,
  title     = {{Dimensionality Reduction for the Sum-of-Distances Metric}},
  author    = {Feng, Zhili and Kacham, Praneeth and Woodruff, David},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {3220-3229},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/feng2021icml-dimensionality/}
}