Pessimistic Off-Policy Multi-Objective Optimization

Abstract

Multi-objective optimization is a class of optimization problems with multiple conflicting objectives. We study offline optimization of multi-objective policies from data collected by a previously deployed policy. We propose a pessimistic estimator for policy values that can be easily plugged into existing formulas for hypervolume computation and optimized. The estimator is based on inverse propensity scores (IPS), and improves upon a naive IPS estimator in both theory and experiments. Our analysis is general, and applies beyond our IPS estimators and methods for optimizing them.

Cite

Text

Alizadeh et al. "Pessimistic Off-Policy Multi-Objective Optimization." Artificial Intelligence and Statistics, 2024.

Markdown

[Alizadeh et al. "Pessimistic Off-Policy Multi-Objective Optimization." Artificial Intelligence and Statistics, 2024.](https://mlanthology.org/aistats/2024/alizadeh2024aistats-pessimistic/)

BibTeX

@inproceedings{alizadeh2024aistats-pessimistic,
  title     = {{Pessimistic Off-Policy Multi-Objective Optimization}},
  author    = {Alizadeh, Shima and Bhargava, Aniruddha and Gopalswamy, Karthick and Jain, Lalit and Kveton, Branislav and Liu, Ge},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2024},
  pages     = {2980-2988},
  volume    = {238},
  url       = {https://mlanthology.org/aistats/2024/alizadeh2024aistats-pessimistic/}
}