WorkflowAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

Abstract

LLM agents are advancing in handling web-based tasks. However, most LLM web agents rely on prompting general-purpose, proprietary models like GPT-4, which are not specifically trained to process web languages (e.g., HTML) or perform long-horizon planning. We explore an alternative paradigm of developing specialized web agents, namely supervised fine-tuning of open-source LLMs using production-scale workflow data. This strategy not only reduces serving costs but also substantially improves the empirical results—our agent achieves state-of-the-art action generation performance on the Mind2Web benchmark and improves the task success rate by 7.3% over existing prompting-based agents on WebArena. We further perform detailed ablation studies on various design choices and provide insights into LLM selection, training recipes, context window optimization, and the effect of dataset sizes.

Cite

Text

Shen et al. "WorkflowAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data." ICLR 2025 Workshops: FM-Wild, 2025.

Markdown

[Shen et al. "WorkflowAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data." ICLR 2025 Workshops: FM-Wild, 2025.](https://mlanthology.org/iclrw/2025/shen2025iclrw-workflowagent/)

BibTeX

@inproceedings{shen2025iclrw-workflowagent,
  title     = {{WorkflowAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data}},
  author    = {Shen, Junhong and Jain, Atishay and Xiao, Zedian and Amlekar, Ishan and Hadji, Mouad and Podolny, Aaron and Talwalkar, Ameet},
  booktitle = {ICLR 2025 Workshops: FM-Wild},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/shen2025iclrw-workflowagent/}
}