A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

Chen, Qianben; Cao, Jingyi; Zhang, Jiayu; Qin, Tianrui; LiXiaowan,; Zhu, King; Shi, Dingfeng; Zhu, He; Liu, Minghao; Liang, Xiaobo; Zhang, Ge; Yang, Jian; Jiang, Yuchen Eleanor; Zhou, Wangchunshu

A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

Qianben Chen, Jingyi Cao, Jiayu Zhang, Tianrui Qin, LiXiaowan, King Zhu, Dingfeng Shi, He Zhu, Minghao Liu, Xiaobo Liang, Ge Zhang, Jian Yang, Yuchen Eleanor Jiang, Wangchunshu Zhou

ICLR 2026

/iclr/2026/chen2026iclr-2fm/

Abstract

Large language models split into two families: reasoning-centric LLMs, which strengthen internal chain-of-thought reasoning but cannot invoke external tools, and agentic LLMs, which learn to interact with environments and leverage tools but often lag in deep reasoning. This divide arises from fundamentally different training objectives, leading to mismatched strengths and inefficiency on simple queries, where both families tend to overthink or over-call tools. In this work, we present Adaptive Agent Foundation Model (A$^2$FM), a unified framework that follows a route-then-align principle: the model first learns task-aware routing and then aligns mode-specific trajectories under a shared backbone. To address the inefficiency gap, we introduce a third instant mode that handles simple queries directly, preventing unnecessary reasoning or tool calls while complementing the agentic and reasoning modes. To jointly enhance accuracy and efficiency, we propose Adaptive Policy Optimization (APO), which enforces adaptive sampling across modes and applies a cost-regularized reward. On the 32B scale, A$^2$FM achieves 13.4\% on BrowseComp, 70.4\% on AIME25, and 16.7\% on HLE, setting new SOTA among comparable models and performing competitively with frontier LLMs across agentic, reasoning, and general benchmarks. Notably, the adaptive execution achieves a cost of pass of only \$0.00487 per correct answer—cutting cost by 45.2\% relative to reasoning and 33.5\% relative to agentic, thus delivering substantially higher cost efficiency while maintaining comparable accuracy.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Chen et al. "A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning." International Conference on Learning Representations, 2026.

Markdown

[Chen et al. "A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/chen2026iclr-2fm/)

BibTeX

@inproceedings{chen2026iclr-2fm,
  title     = {{A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning}},
  author    = {Chen, Qianben and Cao, Jingyi and Zhang, Jiayu and Qin, Tianrui and LiXiaowan,  and Zhu, King and Shi, Dingfeng and Zhu, He and Liu, Minghao and Liang, Xiaobo and Zhang, Ge and Yang, Jian and Jiang, Yuchen Eleanor and Zhou, Wangchunshu},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/chen2026iclr-2fm/}
}