Toward Optimal Solution for the Context-Attentive Bandit Problem
Abstract
In various recommender system applications, from medical diagnosis to dialog systems, due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration; however, the agent has a freedom to choose which variables to observe. In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS), which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting. We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets.
Cite
Text
Bouneffouf et al. "Toward Optimal Solution for the Context-Attentive Bandit Problem." International Joint Conference on Artificial Intelligence, 2021. doi:10.24963/IJCAI.2021/481Markdown
[Bouneffouf et al. "Toward Optimal Solution for the Context-Attentive Bandit Problem." International Joint Conference on Artificial Intelligence, 2021.](https://mlanthology.org/ijcai/2021/bouneffouf2021ijcai-optimal/) doi:10.24963/IJCAI.2021/481BibTeX
@inproceedings{bouneffouf2021ijcai-optimal,
title = {{Toward Optimal Solution for the Context-Attentive Bandit Problem}},
author = {Bouneffouf, Djallel and Féraud, Raphaël and Upadhyay, Sohini and Rish, Irina and Khazaeni, Yasaman},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2021},
pages = {3493-3500},
doi = {10.24963/IJCAI.2021/481},
url = {https://mlanthology.org/ijcai/2021/bouneffouf2021ijcai-optimal/}
}