Agent S: An Open Agentic Framework That Uses Computers like a Human
Abstract
We present Agent S, an open agentic framework that enables autonomous interaction with computers through a Graphical User Interface (GUI), aimed at transforming human-computer interaction by automating complex, multi-step tasks. Agent S aims to address three key challenges in automating computer tasks: acquiring domain-specific knowledge, planning over long task horizons, and handling dynamic, non-uniform interfaces. To this end, Agent S introduces experience-augmented hierarchical planning, which learns from external knowledge search and internal experience retrieval at multiple levels, facilitating efficient task planning and subtask execution. In addition, it employs an Agent-Computer Interface (ACI) to better elicit the reasoning and control capabilities of GUI agents based on Multimodal Large Language Models (MLLMs). Evaluation on the OSWorld benchmark shows that Agent S outperforms the baseline by 9.37% on success rate (an 83.6% relative improvement) and achieves a new state-of-the-art. Comprehensive analysis highlights the effectiveness of individual components and provides insights for future improvements. Furthermore, Agent S demonstrates broad generalizability to different operating systems on a newly-released WindowsAgentArena benchmark.
Cite
Text
Agashe et al. "Agent S: An Open Agentic Framework That Uses Computers like a Human." NeurIPS 2024 Workshops: OWA, 2024.Markdown
[Agashe et al. "Agent S: An Open Agentic Framework That Uses Computers like a Human." NeurIPS 2024 Workshops: OWA, 2024.](https://mlanthology.org/neuripsw/2024/agashe2024neuripsw-agent/)BibTeX
@inproceedings{agashe2024neuripsw-agent,
title = {{Agent S: An Open Agentic Framework That Uses Computers like a Human}},
author = {Agashe, Saaket and Han, Jiuzhou and Gan, Shuyu and Yang, Jiachen and Li, Ang and Wang, Xin Eric},
booktitle = {NeurIPS 2024 Workshops: OWA},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/agashe2024neuripsw-agent/}
}