Best-Item Learning in Random Utility Models with Subset Choices

Abstract

We consider the problem of PAC learning the most valuable item from a pool of $n$ items using sequential, adaptively chosen plays of subsets of $k$ items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities. We identify a new property of such a RUM, termed the minimum advantage, that helps in characterizing the complexity of separating pairs of items based on their relative win/loss empirical counts, and can be bounded as a function of the noise distribution alone. We give a learning algorithm for general RUMs, based on pairwise relative counts of items and hierarchical elimination, along with a new PAC sample complexity guarantee of $O(\frac{n}{c^2\epsilon^2} \log \frac{k}{\delta})$ rounds to identify an $\epsilon$-optimal item with confidence $1-\delta$, when the worst case pairwise advantage in the RUM has sensitivity at least $c$ to the parameter gaps of items. Fundamental lower bounds on PAC sample complexity show that this is near-optimal in terms of its dependence on $n,k$ and $c$.

Cite

Text

Saha and Gopalan. "Best-Item Learning in Random Utility Models with Subset Choices." Artificial Intelligence and Statistics, 2020.

Markdown

[Saha and Gopalan. "Best-Item Learning in Random Utility Models with Subset Choices." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/saha2020aistats-bestitem/)

BibTeX

@inproceedings{saha2020aistats-bestitem,
  title     = {{Best-Item Learning in Random Utility Models with Subset Choices}},
  author    = {Saha, Aadirupa and Gopalan, Aditya},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {4281-4291},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/saha2020aistats-bestitem/}
}