Training a Generally Curious Agent

Abstract

Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information gathering. In this paper, we present **PAPRIKA**, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. By training on synthetic interaction data from different tasks that require diverse strategies, **PAPRIKA** teaches models to explore and adapt their behavior based on the environment feedback in context without gradient updates. Experimental results show that models fine-tuned with **PAPRIKA** can effectively transfer their learned decision-making capabilities to entirely unseen tasks without additional training. We also introduce a curriculum learning algorithm that improves PAPRIKA's sample efficiency. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems that require interactions with the external world.

Cite

Text

Tajwar et al. "Training a Generally Curious Agent." ICLR 2025 Workshops: SSI-FM, 2025.

Markdown

[Tajwar et al. "Training a Generally Curious Agent." ICLR 2025 Workshops: SSI-FM, 2025.](https://mlanthology.org/iclrw/2025/tajwar2025iclrw-training/)

BibTeX

@inproceedings{tajwar2025iclrw-training,
  title     = {{Training a Generally Curious Agent}},
  author    = {Tajwar, Fahim and Jiang, Yiding and Thankaraj, Abitha and Rahman, Sumaita Sadia and Kolter, J Zico and Schneider, Jeff and Salakhutdinov, Ruslan},
  booktitle = {ICLR 2025 Workshops: SSI-FM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/tajwar2025iclrw-training/}
}