Fairness in Learning: Classic and Contextual Bandits

Abstract

We introduce the study of fairness in multi-armed bandit problems. Our fairness definition demands that, given a pool of applicants, a worse applicant is never favored over a better one, despite a learning algorithm’s uncertainty over the true payoffs. In the classic stochastic bandits problem we provide a provably fair algorithm based on “chained” confidence intervals, and prove a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence, providing a strong separation between fair and unfair learning that extends to the general contextual case. In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm and vice versa. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms.

Cite

Text

Joseph et al. "Fairness in Learning: Classic and Contextual Bandits." Neural Information Processing Systems, 2016.

Markdown

[Joseph et al. "Fairness in Learning: Classic and Contextual Bandits." Neural Information Processing Systems, 2016.](https://mlanthology.org/neurips/2016/joseph2016neurips-fairness/)

BibTeX

@inproceedings{joseph2016neurips-fairness,
  title     = {{Fairness in Learning: Classic and Contextual Bandits}},
  author    = {Joseph, Matthew and Kearns, Michael and Morgenstern, Jamie H and Roth, Aaron},
  booktitle = {Neural Information Processing Systems},
  year      = {2016},
  pages     = {325-333},
  url       = {https://mlanthology.org/neurips/2016/joseph2016neurips-fairness/}
}