IMO3: Interactive Multi-Objective Off-Policy Optimization
Abstract
Most real-world optimization problems have multiple objectives. A system designer needs to find a policy that trades off these objectives to reach a desired operating point. This problem has been studied extensively in the setting of known objective functions. However, we consider a more practical but challenging setting of unknown objective functions. In industry, optimization under this setting is mostly approached with online A/B testing, which is often costly and inefficient. As an alternative, we propose Interactive Multi-Objective Off-policy Optimization (IMO^3). The key idea of IMO^3 is to interact with a system designer using policies evaluated in an off-policy fashion to uncover which policy maximizes her unknown utility function. We theoretically show that IMO^3 identifies a near-optimal policy with high probability, depending on the amount of designer's feedback and training data for off-policy estimation. We demonstrate its effectiveness empirically on several multi-objective optimization problems.
Cite
Text
Wang et al. "IMO3: Interactive Multi-Objective Off-Policy Optimization." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/489Markdown
[Wang et al. "IMO3: Interactive Multi-Objective Off-Policy Optimization." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/wang2022ijcai-imo/) doi:10.24963/IJCAI.2022/489BibTeX
@inproceedings{wang2022ijcai-imo,
title = {{IMO3: Interactive Multi-Objective Off-Policy Optimization}},
author = {Wang, Nan and Wang, Hongning and Karimzadehgan, Maryam and Kveton, Branislav and Boutilier, Craig},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2022},
pages = {3523-3529},
doi = {10.24963/IJCAI.2022/489},
url = {https://mlanthology.org/ijcai/2022/wang2022ijcai-imo/}
}