Finding the Bandit in a Graph: Sequential Search-and-Stop

Abstract

We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. Our goal is to maximize the total number of hidden objects found given a time budget. The agent can thus skip an instance after realizing that it would spend too much time on it. Our contributions are both to the search theory and multi-armed bandits. If the distribution is known, we provide a quasi-optimal and efficient stationary strategy. If the distribution is unknown, we additionally show how to sequentially approximate it and, at the same time, act near-optimally in order to collect as many hidden objects as possible.

Cite

Text

Perrault et al. "Finding the Bandit in a Graph: Sequential Search-and-Stop." Artificial Intelligence and Statistics, 2019.

Markdown

[Perrault et al. "Finding the Bandit in a Graph: Sequential Search-and-Stop." Artificial Intelligence and Statistics, 2019.](https://mlanthology.org/aistats/2019/perrault2019aistats-finding/)

BibTeX

@inproceedings{perrault2019aistats-finding,
  title     = {{Finding the Bandit in a Graph: Sequential Search-and-Stop}},
  author    = {Perrault, Pierre and Perchet, Vianney and Valko, Michal},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2019},
  pages     = {1668-1677},
  volume    = {89},
  url       = {https://mlanthology.org/aistats/2019/perrault2019aistats-finding/}
}