An Architecture Search Framework for Inference-Time Techniques

Abstract

Inference-time techniques, such as repeated sampling or iterative revisions, are emerging as powerful ways to enhance large-language models (LLMs) at test time. However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of each technique across models and tasks, the interactions between them, and the massive search space for combining them. To address these challenges, we introduce Archon, a modular and automated framework for optimizing the process of selecting and combining inference-time techniques and LLMs. Given a compute budget and a set of available LLMs, Archon explores a large design space to discover optimized configurations tailored to target benchmarks. It can design custom or general-purpose architectures that advance the Pareto frontier of accuracy vs. maximum token budget compared to top-performing baselines. Across instruction-following, reasoning, and coding tasks, we show that Archon can leverage additional inference compute budget to design systems that outperform frontier models such as OpenAI’s o1, GPT-4o, and Claude 3.5 Sonnet by an average of 15.1%.

Cite

Text

Saad-Falcon et al. "An Architecture Search Framework for Inference-Time Techniques." ICLR 2025 Workshops: SSI-FM, 2025.

Markdown

[Saad-Falcon et al. "An Architecture Search Framework for Inference-Time Techniques." ICLR 2025 Workshops: SSI-FM, 2025.](https://mlanthology.org/iclrw/2025/saadfalcon2025iclrw-architecture/)

BibTeX

@inproceedings{saadfalcon2025iclrw-architecture,
  title     = {{An Architecture Search Framework for Inference-Time Techniques}},
  author    = {Saad-Falcon, Jon and Lafuente, Adrian Gamarra and Natarajan, Shlok and Maru, Nahum and Todorov, Hristo and Guha, Etash Kumar and Buchanan, E. Kelly and Chen, Mayee F and Guha, Neel and Re, Christopher and Mirhoseini, Azalia},
  booktitle = {ICLR 2025 Workshops: SSI-FM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/saadfalcon2025iclrw-architecture/}
}