Position: Exploring the Robustness of Pipeline-Parallelism-Based Decentralized Training
Abstract
Modern machine learning applications increasingly demand greater computational resources for training large models. Decentralized training has emerged as an effective means to democratize this technology. However, the potential threats associated with this approach remain inadequately discussed, posing a hurdle to the development of decentralized training infrastructures. This paper aims to initiate discussion towards this end by exploring the robustness of decentralized training from three primary perspectives. Firstly, we articulate our position on establishing robust decentralized training by outlining potential threats and the corresponding countermeasures. Secondly, we illustrate a nascent poisoning attack targeting decentralized training frameworks, easily executable by malicious stages. To mitigate this security threat and ensure efficient training, we propose a robust training framework, integrating a 100% detection strategy and efficient training mechanisms. Finally, we demonstrate the severity of the proposed attack and the effectiveness of our robust training framework. This position paper emphasizes the urgency of exploring the robustness of decentralized training and proposes a feasible solution. The code is available at https://github.com/dcx001016/pipeline_attack.
Cite
Text
Lu et al. "Position: Exploring the Robustness of Pipeline-Parallelism-Based Decentralized Training." International Conference on Machine Learning, 2024.Markdown
[Lu et al. "Position: Exploring the Robustness of Pipeline-Parallelism-Based Decentralized Training." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/lu2024icml-position/)BibTeX
@inproceedings{lu2024icml-position,
title = {{Position: Exploring the Robustness of Pipeline-Parallelism-Based Decentralized Training}},
author = {Lu, Lin and Dai, Chenxi and Tao, Wangcheng and Yuan, Binhang and Sun, Yanan and Zhou, Pan},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {32978-32989},
volume = {235},
url = {https://mlanthology.org/icml/2024/lu2024icml-position/}
}