Group-Normalized Implicit Value Optimization for Language Models
Abstract
Fine-tuning Large Language Models (LLMs) with reinforcement learning (RL) has become a key technique for enhancing performance on a wide range of tasks, from user alignment to complex reasoning. However, this approach is often hindered by the difficulty of fine-grained credit assignment, as it typically relies on sparse rewards given only at the end of a completely generated sequence. Conventional solutions often require training an auxiliary value network known as critic, which introduces significant computational overhead and training instability. We present Group-Normalized Implicit Value Optimization (GN-IVO), a novel, critic-free algorithm that directly addresses this challenge. GN-IVO learns step-level values implicitly from the policy through a group-normalized distributional matching objective. This approach elegantly circumvents the need for an explicit critic and avoids the computation of the intractable partition function by normalizing values across a group of sampled model responses. Theoretically, we prove that our objective recovers the true value function up to a constant, guaranteeing that the optimal policy is preserved. We demonstrate the practical effectiveness of GN-IVO on a diverse set of text generation and reasoning tasks, showing that it consistently outperforms strong RL baselines for LLMs.
Cite
Text
Choi et al. "Group-Normalized Implicit Value Optimization for Language Models." International Conference on Learning Representations, 2026.Markdown
[Choi et al. "Group-Normalized Implicit Value Optimization for Language Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/choi2026iclr-groupnormalized/)BibTeX
@inproceedings{choi2026iclr-groupnormalized,
title = {{Group-Normalized Implicit Value Optimization for Language Models}},
author = {Choi, Yunseon and Jang, Junyoung and Oh, Chaeyoung and Jeong, Minchan and Hwang, Doo Hwan and Kim, Kee-Eung},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/choi2026iclr-groupnormalized/}
}