Breaking Barriers: Do Reinforcement Post Training Gains Transfer to Unseen Domains?

Abstract

Reinforcement post training (RPT) has recently shown promise in improving the reasoning abilities of large language models (LLMs). However, it remains unclear how well these improvements generalize to new domains, as prior work evaluates RPT models on data from the same domains used for post-training. To understand the generalizability of RPT, we conduct two studies with specific focus on Reinforcement Learning with Verifiable Rewards (RLVR). (1) Observational: we compare a wide range of open-weight RPT models against their corresponding base models across multiple domains, including both seen and unseen domains in their fine-tuning data. (2) Interventional: we fine-tune LLMs with RPT on single domains and evaluate their performance across multiple domains. Both studies converge on the same conclusion that, although RPT brings substantial gains on tasks similar to the fine-tuning data, the gains generalize inconsistently and can vanish on domains with different reasoning patterns.

Cite

Text

Hu et al. "Breaking Barriers: Do Reinforcement Post Training Gains Transfer to Unseen Domains?." International Conference on Learning Representations, 2026.

Markdown

[Hu et al. "Breaking Barriers: Do Reinforcement Post Training Gains Transfer to Unseen Domains?." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/hu2026iclr-breaking/)

BibTeX

@inproceedings{hu2026iclr-breaking,
  title     = {{Breaking Barriers: Do Reinforcement Post Training Gains Transfer to Unseen Domains?}},
  author    = {Hu, Chuxuan and Zhu, Yuxuan and Kellermann, Antony and Biddulph, Caleb and Waiwitlikhit, Suppakit and Benn, Jason and Kang, Daniel},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/hu2026iclr-breaking/}
}