TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models

Becker, Philipp; Freymuth, Niklas; Thilges, Serge; Otto, Fabian; Neumann, Gerhard

TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models

Philipp Becker, Niklas Freymuth, Serge Thilges, Fabian Otto, Gerhard Neumann

ICLR 2026

/iclr/2026/becker2026iclr-troll/

Abstract

Reinforcement Learning (RL) with PPO-like clip objectives has become the standard choice for reward-based fine-tuning of large language models (LLMs). Although recent work has explored improved estimators of advantages and normalization, the clipping mechanism itself has remained untouched. Originally introduced as a proxy for principled KL-based trust regions, clipping is a crude approximation that often causes unstable updates and suboptimal performance. We replace the clip objective with a novel discrete differentiable trust region projection, which provides principled token-level KL constraints. The projection operates on a sparse subset of the model’s most important token logits to balance computational cost and projection effectiveness. Our approach, Trust Region Optimization for Large Language Models (TROLL), serves as a direct replacement for PPO-like clipping during training and does not alter the model’s inference behavior. Across mathematical reasoning and code generation tasks, model families, as well as advantage-estimation methods, TROLL consistently outperforms PPO-like clipping in terms of training speed, stability, and final success rates.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Becker et al. "TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models." International Conference on Learning Representations, 2026.

Markdown

[Becker et al. "TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/becker2026iclr-troll/)

BibTeX

@inproceedings{becker2026iclr-troll,
  title     = {{TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models}},
  author    = {Becker, Philipp and Freymuth, Niklas and Thilges, Serge and Otto, Fabian and Neumann, Gerhard},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/becker2026iclr-troll/}
}