In-Context Learning for Pure Exploration

Abstract

We study the _active sequential hypothesis testing_ problem, also known as _pure exploration_: given a new task, the learner _adaptively collects data_ from the environment to efficiently determine an underlying correct hypothesis. A classical instance of this problem is the task of identifying the best arm in a multi-armed bandit problem (a.k.a. BAI, Best-Arm Identification), where actions index hypotheses. Another important case is generalized search, a problem of determining the correct label through a sequence of strategically selected queries that indirectly reveal information about the label. In this work, we introduce _In-Context Pure Explorer_ (ICPE), which meta-trains Transformers to map _observation histories_ to _query actions_ and a _predicted hypothesis_, yielding a model that transfers in-context. At inference time, ICPE actively gathers evidence on new tasks and infers the true hypothesis without parameter updates. Across deterministic, stochastic, and structured benchmarks, including BAI and generalized search, ICPE is competitive with adaptive baselines while requiring no explicit modeling of information structure. Our results support Transformers as practical architectures for _general sequential testing_.

Cite

Text

Russo et al. "In-Context Learning for Pure Exploration." International Conference on Learning Representations, 2026.

Markdown

[Russo et al. "In-Context Learning for Pure Exploration." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/russo2026iclr-incontext/)

BibTeX

@inproceedings{russo2026iclr-incontext,
  title     = {{In-Context Learning for Pure Exploration}},
  author    = {Russo, Alessio and Welch, Ryan and Pacchiano, Aldo},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/russo2026iclr-incontext/}
}