Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

Gunjal, Anisha; Wang, Anthony; Lau, Elaine; Nath, Vaskar; He, Yunzhong; Liu, Bing; Hendryx, Sean M.

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

Anisha Gunjal, Anthony Wang, Elaine Lau, Vaskar Nath, Yunzhong He, Bing Liu, Sean M. Hendryx

ICLR 2026

/iclr/2026/gunjal2026iclr-rubrics/

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for complex reasoning tasks with clear correctness signals such as math and coding. However, extending it to real-world reasoning tasks is challenging, as evaluation depends on nuanced, multi-criteria judgments rather than binary correctness. Instance-specific rubrics have recently been used in evaluation benchmarks to capture such judgments, but their potential as reward signals for on-policy post-training remains underexplored. We introduce $\textbf{Rubrics as Rewards (\textit{RaR})}$, an on-policy reinforcement learning method that extends RLVR beyond verifiable domains by using rubric-based feedback. Across both medical and science domains, we evaluate multiple strategies for aggregating rubric feedback into rewards. The best RaR variant achieves relative improvements of up to 31\% on HealthBench and 7\% on GPQA-Diamond over popular LLM-as-judge baselines that rely on direct Likert-based rewards. These results demonstrate that RaR-trained policies adapt well to diverse evaluation formats, performing strongly on both rubric-based and multiple-choice tasks. Moreover, we find that using rubrics as structured reward signals yields better alignment for smaller judges and reduces performance variance across judge scales.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Gunjal et al. "Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains." International Conference on Learning Representations, 2026.

Markdown

[Gunjal et al. "Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/gunjal2026iclr-rubrics/)

BibTeX

@inproceedings{gunjal2026iclr-rubrics,
  title     = {{Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains}},
  author    = {Gunjal, Anisha and Wang, Anthony and Lau, Elaine and Nath, Vaskar and He, Yunzhong and Liu, Bing and Hendryx, Sean M.},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/gunjal2026iclr-rubrics/}
}