ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Abstract
Vision-Language Models (VLMs) have enabled computer use agents (CUAs) that operate GUIs autonomously, showing great potential, yet progress is limited by the lack of large-scale, open-source computer use data and foundation models. In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. It offers a large-scale dataset spanning 6 operating systems and 3 task domains, built via a closed-loop pipeline uniting automated agents with human experts. Trained on this scaled-up data, ScaleCUA can operate seamlessly across platforms. Specifically, it delivers strong gains over baselines (+26.6 on WebArena-Lite-v2, +10.7 on ScreenSpot-Pro) and sets new state-of-the-art results (94.4% on MMBench-GUI L1-Hard, 60.6% on OSWorld-G, 47.4% on WebArena-Lite-v2). These findings underscore the power of data-driven scaling for general-purpose computer use agents. We will release data, models, and code to advance future research.
Cite
Text
Liu et al. "ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data." International Conference on Learning Representations, 2026.Markdown
[Liu et al. "ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/liu2026iclr-scalecua/)BibTeX
@inproceedings{liu2026iclr-scalecua,
title = {{ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data}},
author = {Liu, Zhaoyang and Xie, JingJing and Ding, Zichen and Li, Zehao and Yang, Bowen and Wu, Zhenyu and Wang, Xuehui and Sun, Qiushi and Liu, Shi and Wang, Weiyun and Ye, Shenglong and Li, Qingyun and Tian, Zeyue and Luo, Gen and Yue, Xiangyu and Qi, Biqing and Chen, Kai and Zhou, Bowen and Qiao, Yu and Chen, Qifeng and Wang, Wenhai},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/liu2026iclr-scalecua/}
}