Chebyshev Polynomial Codes: Task Entanglement-Based Coding for Distributed Matrix Multiplication
Abstract
Distributed computing has been a prominent solution to efficiently process massive datasets in parallel. However, the existence of stragglers is one of the major concerns that slows down the overall speed of distributed computing. To deal with this problem, we consider a distributed matrix multiplication scenario where a master assigns multiple tasks to each worker to exploit stragglers’ computing ability (which is typically wasted in conventional distributed computing). We propose Chebyshev polynomial codes, which can achieve order-wise improvement in encoding complexity at the master and communication load in distributed matrix multiplication using task entanglement. The key idea of task entanglement is to reduce the number of encoded matrices for multiple tasks assigned to each worker by intertwining encoded matrices. We experimentally demonstrate that, in cloud environments, Chebyshev polynomial codes can provide significant reduction in overall processing time in distributed computing for matrix multiplication, which is a key computational component in modern deep learning.
Cite
Text
Hong et al. "Chebyshev Polynomial Codes: Task Entanglement-Based Coding for Distributed Matrix Multiplication." International Conference on Machine Learning, 2021.Markdown
[Hong et al. "Chebyshev Polynomial Codes: Task Entanglement-Based Coding for Distributed Matrix Multiplication." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/hong2021icml-chebyshev/)BibTeX
@inproceedings{hong2021icml-chebyshev,
title = {{Chebyshev Polynomial Codes: Task Entanglement-Based Coding for Distributed Matrix Multiplication}},
author = {Hong, Sangwoo and Yang, Heecheol and Yoon, Youngseok and Cho, Taehyun and Lee, Jungwoo},
booktitle = {International Conference on Machine Learning},
year = {2021},
pages = {4319-4327},
volume = {139},
url = {https://mlanthology.org/icml/2021/hong2021icml-chebyshev/}
}