Estimating Decision Tree Learnability with Polylogarithmic Sample Complexity

Abstract

We show that top-down decision tree learning heuristics (such as ID3, C4.5, and CART) are amenable to highly efficient {\sl learnability estimation}: for monotone target functions, the error of the decision tree hypothesis constructed by these heuristics can be estimated with {\sl polylogarithmically} many labeled examples, exponentially smaller than the number necessary to run these heuristics, and indeed, exponentially smaller than information-theoretic minimum required to learn a good decision tree. This adds to a small but growing list of fundamental learning algorithms that have been shown to be amenable to learnability estimation. En route to this result, we design and analyze sample-efficient {\sl minibatch} versions of top-down decision tree learning heuristics and show that they achieve the same provable guarantees as the full-batch versions. We further give ``active local'' versions of these heuristics: given a test point $x^\star$, we show how the label $T(x^\star)$ of the decision tree hypothesis $T$ can be computed with polylogarithmically many labeled examples, exponentially smaller than the number necessary to learn~$T$.

Cite

Text

Blanc et al. "Estimating Decision Tree Learnability with Polylogarithmic Sample Complexity." Neural Information Processing Systems, 2020.

Markdown

[Blanc et al. "Estimating Decision Tree Learnability with Polylogarithmic Sample Complexity." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/blanc2020neurips-estimating/)

BibTeX

@inproceedings{blanc2020neurips-estimating,
  title     = {{Estimating Decision Tree Learnability with Polylogarithmic Sample Complexity}},
  author    = {Blanc, Guy and Gupta, Neha and Lange, Jane and Tan, Li-Yang},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/blanc2020neurips-estimating/}
}