Large-Scale Bandit Problems and KWIK Learning
Abstract
We show that parametric multi-armed bandit (MAB) problems with large state and action spaces can be algorithmically reduced to the supervised learning model known as Knows What It Knows or KWIK learning. We give matching impossibility results showing that the KWIK learnability requirement cannot be replaced by weaker supervised learning assumptions. We provide such results in both the standard parametric MAB setting, as well as for a new model in which the action space is finite but growing with time.
Cite
Text
Abernethy et al. "Large-Scale Bandit Problems and KWIK Learning." International Conference on Machine Learning, 2013.Markdown
[Abernethy et al. "Large-Scale Bandit Problems and KWIK Learning." International Conference on Machine Learning, 2013.](https://mlanthology.org/icml/2013/abernethy2013icml-largescale/)BibTeX
@inproceedings{abernethy2013icml-largescale,
title = {{Large-Scale Bandit Problems and KWIK Learning}},
author = {Abernethy, Jacob and Amin, Kareem and Kearns, Michael and Draief, Moez},
booktitle = {International Conference on Machine Learning},
year = {2013},
pages = {588-596},
volume = {28},
url = {https://mlanthology.org/icml/2013/abernethy2013icml-largescale/}
}