Importance Sampling for Fair Policy Selection

Abstract

We consider the problem of off-policy policy selection in reinforcement learning: using historical data generated from running one policy to compare two or more policies. We show that approaches based on importance sampling can be unfair---they can select the worse of two policies more often than not. We then give an example that shows importance sampling is systematically unfair in a practically relevant setting; namely, we show that it unreasonably favors shorter trajectory lengths. We then present sufficient conditions to theoretically guarantee fairness. Finally, we provide a practical importance sampling-based estimator to help mitigate the unfairness due to varying trajectory lengths.

Cite

Text

Doroudi et al. "Importance Sampling for Fair Policy Selection." Conference on Uncertainty in Artificial Intelligence, 2017. doi:10.24963/ijcai.2018/729

Markdown

[Doroudi et al. "Importance Sampling for Fair Policy Selection." Conference on Uncertainty in Artificial Intelligence, 2017.](https://mlanthology.org/uai/2017/doroudi2017uai-importance/) doi:10.24963/ijcai.2018/729

BibTeX

@inproceedings{doroudi2017uai-importance,
  title     = {{Importance Sampling for Fair Policy Selection}},
  author    = {Doroudi, Shayan and Thomas, Philip S. and Brunskill, Emma},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2017},
  doi       = {10.24963/ijcai.2018/729},
  url       = {https://mlanthology.org/uai/2017/doroudi2017uai-importance/}
}