Parallelizing Support Vector Machines on Distributed Computers

Abstract

Support Vector Machines (SVMs) suffer from a widely recognized scalability problem in both memory use and computational time. To improve scalability, we have developed a parallel SVM algorithm (PSVM), which reduces memory use through performing a row-based, approximate matrix factorization, and which loads only essential data to each machine to perform parallel computation. Let $n$ denote the number of training instances, $p$ the reduced matrix dimension after factorization ($p$ is significantly smaller than $n$), and $m$ the number of machines. PSVM reduces the memory requirement from $\MO$($n^2$) to $\MO$($np/m$), and improves computation time to $\MO$($np^2/m$). Empirical studies on up to $500$ computers shows PSVM to be effective.

Cite

Text

Zhu et al. "Parallelizing Support Vector Machines on Distributed Computers." Neural Information Processing Systems, 2007.

Markdown

[Zhu et al. "Parallelizing Support Vector Machines on Distributed Computers." Neural Information Processing Systems, 2007.](https://mlanthology.org/neurips/2007/zhu2007neurips-parallelizing/)

BibTeX

@inproceedings{zhu2007neurips-parallelizing,
  title     = {{Parallelizing Support Vector Machines on Distributed Computers}},
  author    = {Zhu, Kaihua and Wang, Hao and Bai, Hongjie and Li, Jian and Qiu, Zhihuan and Cui, Hang and Chang, Edward Y.},
  booktitle = {Neural Information Processing Systems},
  year      = {2007},
  pages     = {257-264},
  url       = {https://mlanthology.org/neurips/2007/zhu2007neurips-parallelizing/}
}