Fairness in Reinforcement Learning

Abstract

We initiate the study of fairness in reinforcement learning, where the actions of a learning algorithm may affect its environment and future rewards. Our fairness constraint requires that an algorithm never prefers one action over another if the long-term (discounted) reward of choosing the latter action is higher. Our first result is negative: despite the fact that fairness is consistent with the optimal policy, any learning algorithm satisfying fairness must take time exponential in the number of states to achieve non-trivial approximation to the optimal policy. We then provide a provably fair polynomial time algorithm under an approximate notion of fairness, thus establishing an exponential gap between exact and approximate fairness.

Cite

Text

Jabbari et al. "Fairness in Reinforcement Learning." International Conference on Machine Learning, 2017.

Markdown

[Jabbari et al. "Fairness in Reinforcement Learning." International Conference on Machine Learning, 2017.](https://mlanthology.org/icml/2017/jabbari2017icml-fairness/)

BibTeX

@inproceedings{jabbari2017icml-fairness,
  title     = {{Fairness in Reinforcement Learning}},
  author    = {Jabbari, Shahin and Joseph, Matthew and Kearns, Michael and Morgenstern, Jamie and Roth, Aaron},
  booktitle = {International Conference on Machine Learning},
  year      = {2017},
  pages     = {1617-1626},
  volume    = {70},
  url       = {https://mlanthology.org/icml/2017/jabbari2017icml-fairness/}
}