Learning the Pareto Set Under Incomplete Preferences: Pure Exploration in Vector Bandits

Abstract

We study pure exploration in bandit problems with vector-valued rewards, where the goal is to (approximately) identify the Pareto set of arms given incomplete preferences induced by a polyhedral convex cone. We address the open problem of designing sample-efficient learning algorithms for such problems. We propose Pareto Vector Bandits (PaVeBa), an adaptive elimination algorithm that nearly matches the gap-dependent and worst-case lower bounds on the sample complexity of $(\epsilon, \delta)$-PAC Pareto set identification. Finally, we provide an in-depth numerical investigation of PaVeBa and its heuristic variants by comparing them with the state-of-the-art multi-objective and vector optimization algorithms on several real-world datasets with conflicting objectives.

Cite

Text

Karagözlü et al. "Learning the Pareto Set Under Incomplete Preferences: Pure Exploration in Vector Bandits." Artificial Intelligence and Statistics, 2024.

Markdown

[Karagözlü et al. "Learning the Pareto Set Under Incomplete Preferences: Pure Exploration in Vector Bandits." Artificial Intelligence and Statistics, 2024.](https://mlanthology.org/aistats/2024/karagozlu2024aistats-learning/)

BibTeX

@inproceedings{karagozlu2024aistats-learning,
  title     = {{Learning the Pareto Set Under Incomplete Preferences: Pure Exploration in Vector Bandits}},
  author    = {Karagözlü, Efe Mert and Yıldırım, Yaşar Cahit and Ararat, Cağın and Tekin, Cem},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2024},
  pages     = {3070-3078},
  volume    = {238},
  url       = {https://mlanthology.org/aistats/2024/karagozlu2024aistats-learning/}
}