Simultaneous Inference for Massive Data: Distributed Bootstrap

Abstract

In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines. This new method is computationally efficient in that we bootstrap on the master machine without over-resampling, typically required by existing methods (Kleiner et al., 2014; Sengupta et al., 2016), while provably achieving optimal statistical efficiency with minimal communication. Our method does not require repeatedly re-fitting the model but only applies multiplier bootstrap in the master machine on the gradients received from the worker machines. Simulations validate our theory.

Cite

Text

Yu et al. "Simultaneous Inference for Massive Data: Distributed Bootstrap." International Conference on Machine Learning, 2020.

Markdown

[Yu et al. "Simultaneous Inference for Massive Data: Distributed Bootstrap." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/yu2020icml-simultaneous/)

BibTeX

@inproceedings{yu2020icml-simultaneous,
  title     = {{Simultaneous Inference for Massive Data: Distributed Bootstrap}},
  author    = {Yu, Yang and Chao, Shih-Kang and Cheng, Guang},
  booktitle = {International Conference on Machine Learning},
  year      = {2020},
  pages     = {10892-10901},
  volume    = {119},
  url       = {https://mlanthology.org/icml/2020/yu2020icml-simultaneous/}
}