Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees
Abstract
Influence estimation analyzes how changes to the training data can lead to different model predictions; this analysis can help us better understand these predictions, the models making those predictions, and the data sets they are trained on. However, most influence-estimation techniques are designed for deep learning models with continuous parameters. Gradient-boosted decision trees (GBDTs) are a powerful and widely-used class of models; however, these models are black boxes with opaque decision-making processes. In the pursuit of better understanding GBDT predictions and generally improving these models, we adapt recent and popular influence-estimation methods designed for deep learning models to GBDTs. Specifically, we adapt representer-point methods and TracIn, denoting our new methods TREX and BoostIn, respectively; source code is available at https://github.com/jjbrophy47/treeinfluence. We compare these methods to LeafInfluence and other baselines using 5 different evaluation measures on 22 real-world data sets with 4 popular GBDT implementations. These experiments give us a comprehensive overview of how different approaches to influence estimation work in GBDT models. We find BoostIn is an efficient influence-estimation method for GBDTs that performs equally well or better than existing work while being four orders of magnitude faster. Our evaluation also suggests the gold-standard approach of leave-one-out (LOO) retraining consistently identifies the single-most influential training example but performs poorly at finding the most influential set of training examples for a given target prediction.
Cite
Text
Brophy et al. "Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees." Journal of Machine Learning Research, 2023.Markdown
[Brophy et al. "Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees." Journal of Machine Learning Research, 2023.](https://mlanthology.org/jmlr/2023/brophy2023jmlr-adapting/)BibTeX
@article{brophy2023jmlr-adapting,
title = {{Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees}},
author = {Brophy, Jonathan and Hammoudeh, Zayd and Lowd, Daniel},
journal = {Journal of Machine Learning Research},
year = {2023},
pages = {1-48},
volume = {24},
url = {https://mlanthology.org/jmlr/2023/brophy2023jmlr-adapting/}
}