Dissimilarity Bandits
Abstract
We study a novel sequential decision-making setting, namely the dissimilarity bandits. At each round, the learner pulls an arm that provides a stochastic d-dimensional observation vector. The learner aims to identify the pair of arms with the maximum dissimilarity, where such an index is computed over pairs of expected observation vectors. We propose Successive Elimination for Dissimilarity (SED), a fixed-confidence best-pair identification algorithm based on sequential elimination. SED discards individual arms when there is statistical evidence that they cannot belong to a pair of most dissimilar arms and, thus, effectively exploits the structure of the setting by reusing the estimates of the expected observation vectors. We provide results on the sample complexity of SED, depending on HP, a novel index characterizing the complexity of identifying the pair of the most dissimilar arms. Then, we provide a sample complexity lower bound, highlighting the challenges of the identification problem for dissimilarity bandits, which is almost matched by our SED. Finally, we compare our approach over synthetically generated data and a realistic environmental monitoring domain against classical and combinatorial best-arm identification algorithms for the cases $d=1$ and $d>1$.
Cite
Text
Battellani et al. "Dissimilarity Bandits." Artificial Intelligence and Statistics, 2024.Markdown
[Battellani et al. "Dissimilarity Bandits." Artificial Intelligence and Statistics, 2024.](https://mlanthology.org/aistats/2024/battellani2024aistats-dissimilarity/)BibTeX
@inproceedings{battellani2024aistats-dissimilarity,
title = {{Dissimilarity Bandits}},
author = {Battellani, Paolo and Maria Metelli, Alberto and Trovò, Francesco},
booktitle = {Artificial Intelligence and Statistics},
year = {2024},
pages = {3637-3645},
volume = {238},
url = {https://mlanthology.org/aistats/2024/battellani2024aistats-dissimilarity/}
}