No-Regret Bandit Exploration Based on Soft Tree Ensemble Model

Abstract

We propose a novel stochastic bandit algorithm that employs reward estimates using a tree ensemble model. Specifically, our focus is on a soft tree model, a variant of the conventional decision tree that has undergone both practical and theoretical scrutiny in recent years. By deriving several non-trivial properties of soft trees, we extend the existing analytical techniques used for neural bandit algorithms to our soft tree-based algorithm. We demonstrate that our algorithm achieves a smaller cumulative regret compared to the existing ReLU-based neural bandit algorithms. We also show that this advantage comes with a trade-off: the hypothesis space of the soft tree ensemble model is more constrained than that of a ReLU-based neural network.

Cite

Text

Iwazaki and Suzumura. "No-Regret Bandit Exploration Based on Soft Tree Ensemble Model." Neural Information Processing Systems, 2024. doi:10.52202/079017-0661

Markdown

[Iwazaki and Suzumura. "No-Regret Bandit Exploration Based on Soft Tree Ensemble Model." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/iwazaki2024neurips-noregret/) doi:10.52202/079017-0661

BibTeX

@inproceedings{iwazaki2024neurips-noregret,
  title     = {{No-Regret Bandit Exploration Based on Soft Tree Ensemble Model}},
  author    = {Iwazaki, Shogo and Suzumura, Shinya},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0661},
  url       = {https://mlanthology.org/neurips/2024/iwazaki2024neurips-noregret/}
}