Vector Optimization with Stochastic Bandit Feedback

Abstract

We introduce vector optimization problems with stochastic bandit feedback, in which preferences among designs are encoded by a polyhedral ordering cone $C$. Our setup generalizes the best arm identification problem to vector-valued rewards by extending the concept of Pareto set beyond multi-objective optimization. We characterize the sample complexity of ($\epsilon,\delta$)-PAC Pareto set identification by defining a new cone-dependent notion of complexity, called the ordering complexity. In particular, we provide gap-dependent and worst-case lower bounds on the sample complexity and show that, in the worst-case, the sample complexity scales with the square of ordering complexity. Furthermore, we investigate the sample complexity of the na{ı̈}ve elimination algorithm and prove that it nearly matches the worst-case sample complexity. Finally, we run experiments to verify our theoretical results and illustrate how $C$ and sampling budget affect the Pareto set, the returned ($\epsilon,\delta$)-PAC Pareto set, and the success of identification.

Cite

Text

Ararat and Tekin. "Vector Optimization with Stochastic Bandit Feedback." Artificial Intelligence and Statistics, 2023.

Markdown

[Ararat and Tekin. "Vector Optimization with Stochastic Bandit Feedback." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/ararat2023aistats-vector/)

BibTeX

@inproceedings{ararat2023aistats-vector,
  title     = {{Vector Optimization with Stochastic Bandit Feedback}},
  author    = {Ararat, Cagin and Tekin, Cem},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {2165-2190},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/ararat2023aistats-vector/}
}