MARGE: Improving Math Reasoning with Guided Exploration

Abstract

Large Language Models (LLMs) exhibit strong potential in mathematical reasoning, yet their effectiveness is often limited by a shortage of high-quality queries. This limitation necessitates scaling up computational responses through self-generated data, yet current methods struggle due to spurious correlated data caused by ineffective exploration across all reasoning stages. To address such challenge, we introduce MARGE: Improving Math Reasoning with Guided Exploration, a novel method that enhances mathematical reasoning through hit-guided exploration. MARGE systematically explores intermediate reasoning states derived from self-generated solutions, enabling adequate exploration and improved credit assignment throughout the reasoning process. Notably, MARGE improves both single-shot accuracy and exploration diversity, mitigating a common trade-off in alignment methods. These results demonstrate MARGE’s effectiveness in enhancing mathematical reasoning capabilities and unlocking the potential of scaling self-generated training data.

Cite

Text

Gao et al. "MARGE: Improving Math Reasoning with Guided Exploration." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Gao et al. "MARGE: Improving Math Reasoning with Guided Exploration." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/gao2025icml-marge/)

BibTeX

@inproceedings{gao2025icml-marge,
  title     = {{MARGE: Improving Math Reasoning with Guided Exploration}},
  author    = {Gao, Jingyue and Lin, Runji and Lu, Keming and Yu, Bowen and Lin, Junyang and Chen, Jianyu},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {18430-18452},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/gao2025icml-marge/}
}