Multi-Objective Neural Bandits with Random Scalarization

Abstract

Multi-objective multi-armed bandit (MOMAB) problems are crucial for complex decision-making scenarios where multiple conflicting objectives must be simultaneously optimized. However, most existing works are based on the linear assumption of the feedback rewards, which significantly constrains their applicability and efficacy in capturing the intricate dynamics of real-world environments. This paper explores a multi-objective neural bandit (MONB) framework, which integrates the universal approximators, neural networks, with the classical MOMABs. We adopt random scalarization to accommodate the special needs of a practitioner by setting an appropriate distribution on the regions of interest. Using the trade-off capabilities of upper confidence bound (UCB) and Thompson sampling (TS) strategies, we propose two novel algorithms, MONeural-UCB and MONeural-TS. Theoretical and empirical analysis demonstrate the superiority of our methods in multi-objective or multi-task bandit problems, which makes great improvement over the classical linear MOMABs.

Cite

Text

Cheng et al. "Multi-Objective Neural Bandits with Random Scalarization." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/547

Markdown

[Cheng et al. "Multi-Objective Neural Bandits with Random Scalarization." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/cheng2025ijcai-multi/) doi:10.24963/IJCAI.2025/547

BibTeX

@inproceedings{cheng2025ijcai-multi,
  title     = {{Multi-Objective Neural Bandits with Random Scalarization}},
  author    = {Cheng, Ji and Xue, Bo and Lu, Chengyu and Cui, Ziqiang and Zhang, Qingfu},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {4914-4922},
  doi       = {10.24963/IJCAI.2025/547},
  url       = {https://mlanthology.org/ijcai/2025/cheng2025ijcai-multi/}
}