Chasing the Tail: Effective Rubric-Based Reward Modeling for Large Language Model Post-Training

Zhang, Junkai; Wang, Zihao; Gui, Lin; Sathyendra, Swarnashree Mysore; Jeong, Jaehwan; Veitch, Victor; Wang, Wei; He, Yunzhong; Liu, Bing; Jin, Lifeng

Chasing the Tail: Effective Rubric-Based Reward Modeling for Large Language Model Post-Training

Junkai Zhang, Zihao Wang, Lin Gui, Swarnashree Mysore Sathyendra, Jaehwan Jeong, Victor Veitch, Wei Wang, Yunzhong He, Bing Liu, Lifeng Jin

ICLR 2026

/iclr/2026/zhang2026iclr-chasing/

Abstract

Reinforcement fine-tuning (RFT) often suffers from reward over-optimization, where a policy model hacks the reward signals to achieve high scores while producing low-quality outputs. Our theoretical analysis shows that the key lies in reward misspecification at the high-reward tail: the inability to reliably distinguish excellent responses from merely great ones. This motivate us to focus on the high-reward region. However, such tail examples are scarce under the base LLM. While off-policy exemplars (e.g. from stronger models or rewrites) are easier to obtain, naively training on them yields a misspecified reward for the policy we aim to align. To address this, we study rubric-based rewards. By design, rubrics can leverage off-policy examples while remaining insensitive to their artifacts. To elicit rubrics that capture the high-reward tail, we highlight the importance of distinguishing among great and diverse responses, and introduce a workflow to implement this idea. We empirically demonstrate that rubric-based rewards substantially mitigate reward over-optimization and deliver effective LLM post-training improvements.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zhang et al. "Chasing the Tail: Effective Rubric-Based Reward Modeling for Large Language Model Post-Training." International Conference on Learning Representations, 2026.

Markdown

[Zhang et al. "Chasing the Tail: Effective Rubric-Based Reward Modeling for Large Language Model Post-Training." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-chasing/)

BibTeX

@inproceedings{zhang2026iclr-chasing,
  title     = {{Chasing the Tail: Effective Rubric-Based Reward Modeling for Large Language Model Post-Training}},
  author    = {Zhang, Junkai and Wang, Zihao and Gui, Lin and Sathyendra, Swarnashree Mysore and Jeong, Jaehwan and Veitch, Victor and Wang, Wei and He, Yunzhong and Liu, Bing and Jin, Lifeng},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhang2026iclr-chasing/}
}