Multi-Objective Neural Bandits with Random Scalarization
Abstract
Multi-objective multi-armed bandit (MOMAB) problems are crucial for complex decision-making scenarios where multiple conflicting objectives must be simultaneously optimized. However, most existing works are based on the linear assumption of the feedback rewards, which significantly constrains their applicability and efficacy in capturing the intricate dynamics of real-world environments. This paper explores a multi-objective neural bandit (MONB) framework, which integrates the universal approximators, neural networks, with the classical MOMABs. We adopt random scalarization to accommodate the special needs of a practitioner by setting an appropriate distribution on the regions of interest. Using the trade-off capabilities of upper confidence bound (UCB) and Thompson sampling (TS) strategies, we propose two novel algorithms, MONeural-UCB and MONeural-TS. Theoretical and empirical analysis demonstrate the superiority of our methods in multi-objective or multi-task bandit problems, which makes great improvement over the classical linear MOMABs.
Cite
Text
Cheng et al. "Multi-Objective Neural Bandits with Random Scalarization." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/547Markdown
[Cheng et al. "Multi-Objective Neural Bandits with Random Scalarization." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/cheng2025ijcai-multi/) doi:10.24963/IJCAI.2025/547BibTeX
@inproceedings{cheng2025ijcai-multi,
title = {{Multi-Objective Neural Bandits with Random Scalarization}},
author = {Cheng, Ji and Xue, Bo and Lu, Chengyu and Cui, Ziqiang and Zhang, Qingfu},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {4914-4922},
doi = {10.24963/IJCAI.2025/547},
url = {https://mlanthology.org/ijcai/2025/cheng2025ijcai-multi/}
}