Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
Abstract
Unlocking advanced reasoning in large language model agents is hindered by a scarcity of training data situated at the very frontier of their capabilities. We address this with a novel data synthesis approach inspired by the educational theory of the Zone of Proximal Development (ZPD), which conceptualizes this frontier as tasks an LLM cannot solve independently but can master with guidance. We operationalize this principle through the AgentFrontier Data Engine, an automated pipeline that synthesizes high-quality, multidisciplinary data situated precisely within an LLM's ZPD. The engine yields two synergistic outputs: knowledge-intensive data for continued pre-training and frontier-level reasoning trajectories for post-training. Concurrently, it produces the ZPD Exam, a self-evolving benchmark for evaluating agent capabilities by compelling them to reason beyond their parameterized knowledge. By training our AgentFrontier-30B-A3B model on the synthesized data, we achieve state-of-the-art results on demanding benchmarks like Humanity's Last Exam, outperforming several leading proprietary agents. This work establishes ZPD-guided data synthesis as a scalable and effective paradigm for cultivating increasingly capable LLM agents.
Cite
Text
Chen et al. "Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis." International Conference on Learning Representations, 2026.Markdown
[Chen et al. "Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/chen2026iclr-expanding/)BibTeX
@inproceedings{chen2026iclr-expanding,
title = {{Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis}},
author = {Chen, Xuanzhong and Qiao, Zile and Chen, Guoxin and Su, Liangcai and Zhang, Zhen and Wang, Xinyu and Xie, Pengjun and Huang, Fei and Zhou, Jingren and Jiang, Yong and Chen, Ting},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/chen2026iclr-expanding/}
}