Distributed Bootstrap for Simultaneous Inference Under High Dimensionality
Abstract
We propose a distributed bootstrap method for simultaneous inference on high-dimensional massive data that are stored and processed with many machines. The method produces an $\ell_\infty$-norm confidence region based on a communication-efficient de-biased lasso, and we propose an efficient cross-validation approach to tune the method at every iteration. We theoretically prove a lower bound on the number of communication rounds $\tau_{\min}$ that warrants the statistical accuracy and efficiency. Furthermore, $\tau_{\min}$ only increases logarithmically with the number of workers and the intrinsic dimensionality, while nearly invariant to the nominal dimensionality. We test our theory by extensive simulation studies, and a variable screening task on a semi-synthetic dataset based on the US Airline On-Time Performance dataset. The code to reproduce the numerical results is available in Supplementary Material.
Cite
Text
Yu et al. "Distributed Bootstrap for Simultaneous Inference Under High Dimensionality." Journal of Machine Learning Research, 2022.Markdown
[Yu et al. "Distributed Bootstrap for Simultaneous Inference Under High Dimensionality." Journal of Machine Learning Research, 2022.](https://mlanthology.org/jmlr/2022/yu2022jmlr-distributed/)BibTeX
@article{yu2022jmlr-distributed,
title = {{Distributed Bootstrap for Simultaneous Inference Under High Dimensionality}},
author = {Yu, Yang and Chao, Shih-Kang and Cheng, Guang},
journal = {Journal of Machine Learning Research},
year = {2022},
pages = {1-77},
volume = {23},
url = {https://mlanthology.org/jmlr/2022/yu2022jmlr-distributed/}
}