The Binomial Block Bootstrap Estimator for Evaluating Loss on Dependent Clusters

Abstract

In this paper, we study the non-IID learning setting where samples exhibit dependency within latent clusters. Our goal is to estimate a learner's loss on new clusters, an extension of the out-of-bag error. Previously developed cross-validation estimators are well suited to the case where the clustering of observed data is known a priori. However, as is often the case in real world problems, we are only given a noisy approximation of this clustering, likely the result of some clustering algorithm. This subtle yet potentially significant issue afflicts domains ranging from image classification to medical diagnostics, where naive cross-validation is an optimistically biased estimator. We present a novel bootstrap technique and corresponding cross-validation method that, somewhat counterintuitively, injects additional dependency to asymptotically recover the loss in the independent setting.

Cite

Text

Barnes and Dubrawski. "The Binomial Block Bootstrap Estimator for Evaluating Loss on Dependent Clusters." Conference on Uncertainty in Artificial Intelligence, 2017.

Markdown

[Barnes and Dubrawski. "The Binomial Block Bootstrap Estimator for Evaluating Loss on Dependent Clusters." Conference on Uncertainty in Artificial Intelligence, 2017.](https://mlanthology.org/uai/2017/barnes2017uai-binomial/)

BibTeX

@inproceedings{barnes2017uai-binomial,
  title     = {{The Binomial Block Bootstrap Estimator for Evaluating Loss on Dependent Clusters}},
  author    = {Barnes, Matt and Dubrawski, Artur},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2017},
  url       = {https://mlanthology.org/uai/2017/barnes2017uai-binomial/}
}