Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

Shrivastava, Vaishnavi; Awadallah, Ahmed Hassan; Balachandran, Vidhisha; Garg, Shivam; Behl, Harkirat; Papailiopoulos, Dimitris

Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

Vaishnavi Shrivastava, Ahmed Hassan Awadallah, Vidhisha Balachandran, Shivam Garg, Harkirat Behl, Dimitris Papailiopoulos

ICLR 2026

/iclr/2026/shrivastava2026iclr-sample/

Abstract

Large language models trained with reinforcement learning on verifiable rewards often inflate response length—trading brevity for accuracy. While longer reasoning can help on hard problems, many extra tokens are filler: verbose text making little progress. We introduce GFPO (Group Filtered Policy Optimization), which curbs this length explosion by sampling larger groups per problem and only training on responses filtered by (1) length and (2) token efficiency (reward per token). By sampling more during training time, GFPO teaches models to think less at inference time. On Phi-4-reasoning, GFPO cuts GRPO’s length inflation by up to 85\% across STEM and coding benchmarks (AIME 24/25, GPQA, Omni-MATH, LiveCodeBench) while preserving accuracy. We find that GFPO also outperforms Dr. GRPO in both accuracy and length reduction and generalizes across model sizes and families. We further propose Adaptive Difficulty GFPO, which allocates more training exploration to harder problems, yielding better efficiency-accuracy trade-offs on challenging questions. With only a 7\% increase in training time, GFPO reduces end-to-end latency by $\sim$30\%, cutting response time on hard queries by 90 seconds. GFPO trades modest training-time increases for lasting gains in inference—an effective recipe for efficient reasoning.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Shrivastava et al. "Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning." International Conference on Learning Representations, 2026.

Markdown

[Shrivastava et al. "Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/shrivastava2026iclr-sample/)

BibTeX

@inproceedings{shrivastava2026iclr-sample,
  title     = {{Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning}},
  author    = {Shrivastava, Vaishnavi and Awadallah, Ahmed Hassan and Balachandran, Vidhisha and Garg, Shivam and Behl, Harkirat and Papailiopoulos, Dimitris},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/shrivastava2026iclr-sample/}
}