How to Train Your LLM Web Agent: A Statistical Diagnosis

Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Thibault Le Sellier de Chezelles, Megh Thakkar, Nicolas Gontier, Miguel Muñoz-Mármol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Alexandre Piché, Alexandre Lacoste, Massimo Caccia

NeurIPS 2025

/neurips/2025/vattikonda2025neurips-train/

Abstract

Large language model (LLM) agents for web interfaces have advanced rapidly, yet open-source systems still lag behind proprietary agents. Bridging this gap is key to enabling customizable, efficient, and privacy-preserving agents. Two challenges hinder progress: the reproducibility issues in RL and LLM agent training, where results often depend on sensitive factors like seeds and decoding parameters, and the focus of prior work on single-step tasks, overlooking the complexities of web-based, multi-step decision-making. We address these gaps by providing a statistically driven study of training LLM agents for web tasks. Our two-stage pipeline combines imitation learning from a Llama 3.3 70B teacher with on-policy fine-tuning via Group Relative Policy Optimization (GRPO) on a Llama 3.1 8B student. Through 240 configuration sweeps and rigorous bootstrapping, we chart the first compute allocation curve for open-source LLM web agents. Our findings show that dedicating one-third of compute to teacher traces and the rest to RL improves MiniWoB++ success by 6 points and closes 60\% of the gap to GPT-4o on WorkArena, while cutting GPU costs by 45\%. We introduce a principled hyperparameter sensitivity analysis, offering actionable guidelines for robust and cost-effective agent training.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Vattikonda et al. "How to Train Your LLM Web Agent: A Statistical Diagnosis." Advances in Neural Information Processing Systems, 2025.

Markdown

[Vattikonda et al. "How to Train Your LLM Web Agent: A Statistical Diagnosis." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/vattikonda2025neurips-train/)

BibTeX

@inproceedings{vattikonda2025neurips-train,
  title     = {{How to Train Your LLM Web Agent: A Statistical Diagnosis}},
  author    = {Vattikonda, Dheeraj and Ravichandran, Santhoshi and Penaloza, Emiliano and Nekoei, Hadi and de Chezelles, Thibault Le Sellier and Thakkar, Megh and Gontier, Nicolas and Muñoz-Mármol, Miguel and Shayegan, Sahar Omidi and Raimondo, Stefania and Liu, Xue and Drouin, Alexandre and Piché, Alexandre and Lacoste, Alexandre and Caccia, Massimo},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/vattikonda2025neurips-train/}
}