Multiple Trade-Offs: An Improved Approach for Lexicographic Linear Bandits
Abstract
This paper studies lexicographic online learning within the framework of multiobjective stochastic linear bandits (MOSLB), where the agent aims to simultaneously maximize multiple objectives in a hierarchical manner. Previous literature has investigated lexicographic online learning in multiobjective multi-armed bandits, a special case of MOSLB. They provided a suboptimal algorithm whose regret bound is approximately O(T^(2/3)) based on a priority-based regret metric. In this paper, we propose an algorithm for lexicographic online learning in the MOSLB model, achieving an almost optimal regret bound of approximately O(dT^(1/2)) when evaluated by the general regret metric. Here, d is the dimension of arm vectors, and T is the time horizon. Our method introduces a new arm filter and a multiple trade-offs approach to effectively balance exploration and exploitation across different objectives. Experiments confirm the merits of our algorithms and provide compelling evidence to support our analysis.
Cite
Text
Xue et al. "Multiple Trade-Offs: An Improved Approach for Lexicographic Linear Bandits." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I20.35491Markdown
[Xue et al. "Multiple Trade-Offs: An Improved Approach for Lexicographic Linear Bandits." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/xue2025aaai-multiple/) doi:10.1609/AAAI.V39I20.35491BibTeX
@inproceedings{xue2025aaai-multiple,
title = {{Multiple Trade-Offs: An Improved Approach for Lexicographic Linear Bandits}},
author = {Xue, Bo and Lin, Xi and Zhang, Xiaoyuan and Zhang, Qingfu},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {21850-21858},
doi = {10.1609/AAAI.V39I20.35491},
url = {https://mlanthology.org/aaai/2025/xue2025aaai-multiple/}
}