Tree-Values: Selective Inference for Regression Trees

Abstract

We consider conducting inference on the output of the Classification and Regression Tree (CART) (Breiman et al., 1984) algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.

Cite

Text

Neufeld et al. "Tree-Values: Selective Inference for Regression Trees." Journal of Machine Learning Research, 2022.

Markdown

[Neufeld et al. "Tree-Values: Selective Inference for Regression Trees." Journal of Machine Learning Research, 2022.](https://mlanthology.org/jmlr/2022/neufeld2022jmlr-treevalues/)

BibTeX

@article{neufeld2022jmlr-treevalues,
  title     = {{Tree-Values: Selective Inference for Regression Trees}},
  author    = {Neufeld, Anna C. and Gao, Lucy L. and Witten, Daniela M.},
  journal   = {Journal of Machine Learning Research},
  year      = {2022},
  pages     = {1-43},
  volume    = {23},
  url       = {https://mlanthology.org/jmlr/2022/neufeld2022jmlr-treevalues/}
}