On Efficient Constructions of Checkpoints

Abstract

Efficient construction of checkpoints/snapshots is a critical tool for training and diagnosing deep learning models. In this paper, we propose a lossy compression scheme for checkpoint constructions (called LC-Checkpoint). LC-Checkpoint simultaneously maximizes the compression rate and optimizes the recovery speed, under the assumption that SGD is used to train the model. LC-Checkpoint uses quantization and priority promotion to store the most crucial information for SGD to recover, and then uses a Huffman coding to leverage the non-uniform distribution of the gradient scales. Our extensive experiments show that LC-Checkpoint achieves a compression rate up to 28{\texttimes} and recovery speedup up to 5.77{\texttimes} over a state-of-the-art algorithm (SCAR).

Cite

Text

Chen et al. "On Efficient Constructions of Checkpoints." International Conference on Machine Learning, 2020.

Markdown

[Chen et al. "On Efficient Constructions of Checkpoints." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/chen2020icml-efficient/)

BibTeX

@inproceedings{chen2020icml-efficient,
  title     = {{On Efficient Constructions of Checkpoints}},
  author    = {Chen, Yu and Liu, Zhenming and Ren, Bin and Jin, Xin},
  booktitle = {International Conference on Machine Learning},
  year      = {2020},
  pages     = {1627-1636},
  volume    = {119},
  url       = {https://mlanthology.org/icml/2020/chen2020icml-efficient/}
}