LLMs Are Greedy Agents: Effects of RL Fine-Tuning on Decision-Making Abilities

Schmied, Thomas; Bornschein, Jörg; Grau-Moya, Jordi; Wulfmeier, Markus; Pascanu, Razvan

LLMs Are Greedy Agents: Effects of RL Fine-Tuning on Decision-Making Abilities

Thomas Schmied, Jörg Bornschein, Jordi Grau-Moya, Markus Wulfmeier, Razvan Pascanu

ICLR 2026

/iclr/2026/schmied2026iclr-llms/

Abstract

The success of LLMs has sparked interest in various agentic applications. A key hypothesis is that LLMs, leveraging common sense and Chain-of-Thought (CoT) reasoning, can effectively explore and efficiently solve complex domains. However, LLM agents have been found to suffer from sub-optimal exploration and the knowing-doing gap, the inability to effectively act on knowledge present in the model. In this work, we systematically study why LLMs perform sub-optimally in decision-making scenarios. In particular, we closely examine three prevalent failure modes: greediness, frequency bias, and the knowing-doing gap. We propose mitigation of these shortcomings by fine-tuning via Reinforcement Learning (RL) on self-generated CoT rationales. Our experiments across multi-armed bandits, contextual bandits, and Tic-tac-toe demonstrate that RL fine-tuning enhances the decision-making abilities of LLMs by increasing exploration and narrowing the knowing-doing gap. Finally, we study both classic exploration mechanisms, such as $\epsilon$-greedy, and LLM-specific approaches, such as self-correction and self-consistency, to enable more effective fine-tuning of LLMs for decision-making.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Schmied et al. "LLMs Are Greedy Agents: Effects of RL Fine-Tuning on Decision-Making Abilities." International Conference on Learning Representations, 2026.

Markdown

[Schmied et al. "LLMs Are Greedy Agents: Effects of RL Fine-Tuning on Decision-Making Abilities." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/schmied2026iclr-llms/)

BibTeX

@inproceedings{schmied2026iclr-llms,
  title     = {{LLMs Are Greedy Agents: Effects of RL Fine-Tuning on Decision-Making Abilities}},
  author    = {Schmied, Thomas and Bornschein, Jörg and Grau-Moya, Jordi and Wulfmeier, Markus and Pascanu, Razvan},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/schmied2026iclr-llms/}
}