Group-Normalized Implicit Value Optimization for Language Models

Choi, Yunseon; Jang, Junyoung; Oh, Chaeyoung; Jeong, Minchan; Hwang, Doo Hwan; Kim, Kee-Eung

Group-Normalized Implicit Value Optimization for Language Models

Yunseon Choi, Junyoung Jang, Chaeyoung Oh, Minchan Jeong, Doo Hwan Hwang, Kee-Eung Kim

ICLR 2026

/iclr/2026/choi2026iclr-groupnormalized/

Abstract

Fine-tuning Large Language Models (LLMs) with reinforcement learning (RL) has become a key technique for enhancing performance on a wide range of tasks, from user alignment to complex reasoning. However, this approach is often hindered by the difficulty of fine-grained credit assignment, as it typically relies on sparse rewards given only at the end of a completely generated sequence. Conventional solutions often require training an auxiliary value network known as critic, which introduces significant computational overhead and training instability. We present Group-Normalized Implicit Value Optimization (GN-IVO), a novel, critic-free algorithm that directly addresses this challenge. GN-IVO learns step-level values implicitly from the policy through a group-normalized distributional matching objective. This approach elegantly circumvents the need for an explicit critic and avoids the computation of the intractable partition function by normalizing values across a group of sampled model responses. Theoretically, we prove that our objective recovers the true value function up to a constant, guaranteeing that the optimal policy is preserved. We demonstrate the practical effectiveness of GN-IVO on a diverse set of text generation and reasoning tasks, showing that it consistently outperforms strong RL baselines for LLMs.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Choi et al. "Group-Normalized Implicit Value Optimization for Language Models." International Conference on Learning Representations, 2026.

Markdown

[Choi et al. "Group-Normalized Implicit Value Optimization for Language Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/choi2026iclr-groupnormalized/)

BibTeX

@inproceedings{choi2026iclr-groupnormalized,
  title     = {{Group-Normalized Implicit Value Optimization for Language Models}},
  author    = {Choi, Yunseon and Jang, Junyoung and Oh, Chaeyoung and Jeong, Minchan and Hwang, Doo Hwan and Kim, Kee-Eung},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/choi2026iclr-groupnormalized/}
}