LongHorizonUI: A Unified Framework for Robust Long-Horizon Task Automation of GUI Agent

Kang, Bin; Wen, Shaoguo; Bi, Yifei; Wu, Shunlong; Yuan, Xinbin; Shao, Rui; Wang, Junle; Tian, Zhuotao

LongHorizonUI: A Unified Framework for Robust Long-Horizon Task Automation of GUI Agent

Bin Kang, Shaoguo Wen, Yifei Bi, Shunlong Wu, Xinbin Yuan, Rui Shao, Junle Wang, Zhuotao Tian

ICLR 2026

/iclr/2026/kang2026iclr-longhorizonui/

Abstract

Although agents based on multimodal large language models (MLLMs) demonstrate proficiency in general short-term graphical user interface (GUI) tasks, their robustness remains a significant challenge for handling complex long-horizon tasks in dynamic environments . In response, the LongHorizonUI framework is proposed to improve the sustained reliability of agents in long-horizon GUI tasks. To overcome core limitations, we establish a comprehensive long-horizon benchmark, LongGUIBench, covering multiple categories of games and complex general applications, with long-horizon tasks defined as requiring more than 15 steps for rigorous evaluation of long-horizon reasoning capabilities. Based on this, a Multimodal Enhanced Perceiver is designed to incorporate element detection and text recognition models, assigning unique indices to interface elements, thereby reinforcing state representation. Furthermore, a Deep Reflection Decider engine is introduced, incorporating a structured multi-level feedback validation mechanism to enable progressive reasoning and ensure accurate action execution with predictable trajectories. Finally, we introduce a Compensatory Action Executor that combines multiple degradation compensation operations with a process rollback strategy based on execution progress monitoring to ensure operational effectiveness in long-horizon task logic. Experimental results demonstrate that LongHorizonUI achieves substantial long-horizon modeling improvements on LongGUIBench while retaining competitive performance on diverse public benchmarks. The code and models will be publicly available.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Kang et al. "LongHorizonUI: A Unified Framework for Robust Long-Horizon Task Automation of GUI Agent." International Conference on Learning Representations, 2026.

Markdown

[Kang et al. "LongHorizonUI: A Unified Framework for Robust Long-Horizon Task Automation of GUI Agent." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/kang2026iclr-longhorizonui/)

BibTeX

@inproceedings{kang2026iclr-longhorizonui,
  title     = {{LongHorizonUI: A Unified Framework for Robust Long-Horizon Task Automation of GUI Agent}},
  author    = {Kang, Bin and Wen, Shaoguo and Bi, Yifei and Wu, Shunlong and Yuan, Xinbin and Shao, Rui and Wang, Junle and Tian, Zhuotao},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/kang2026iclr-longhorizonui/}
}