MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning

Huang, Kun; Xu, Weikai; Liu, Yuxuan; Wang, Quandong; Gao, Pengzhi; Liu, Wei; Luan, Jian; Wang, Bin; An, Bo

MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning

Kun Huang, Weikai Xu, Yuxuan Liu, Quandong Wang, Pengzhi Gao, Wei Liu, Jian Luan, Bin Wang, Bo An

ICLR 2026

/iclr/2026/huang2026iclr-mobileipl/

Abstract

The Chain of Action-Planning Thoughts (CoaT) paradigm has been shown to improve the reasoning performance of VLM-based mobile agents in GUI tasks. However, the scarcity of diverse CoaT trajectories limits the expressiveness and generalization ability of such agents. While self-training is commonly employed to address data scarcity, existing approaches either overlook the correctness of intermediate reasoning steps or depend on expensive process-level annotations to construct process reward models (PRM). To address the above problems, we propose an Iterative Preference Learning (IPL) that constructs a CoaT-tree through interative sampling, scores leaf nodes using rule-based reward, and backpropagates feedback to derive Thinking-level Direct Preference Optimization (T-DPO) pairs. To prevent overfitting during warm-up supervised fine-tuning, we further introduce a three-stage instruction evolution, which leverages GPT-4o to generate diverse Q&A pairs based on real mobile UI screenshots, enhancing both generality and layout understanding. Experiments on three standard Mobile GUI-agent benchmarks demonstrate that our agent MobileIPL outperforms strong baselines, including continual pretraining models such as OS-ATLAS and UI-TARS. It achieves state-of-the-art performance across three standard Mobile GUI-Agents benchmarks and shows strong generalization to out-of-domain scenarios.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Huang et al. "MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning." International Conference on Learning Representations, 2026.

Markdown

[Huang et al. "MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/huang2026iclr-mobileipl/)

BibTeX

@inproceedings{huang2026iclr-mobileipl,
  title     = {{MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning}},
  author    = {Huang, Kun and Xu, Weikai and Liu, Yuxuan and Wang, Quandong and Gao, Pengzhi and Liu, Wei and Luan, Jian and Wang, Bin and An, Bo},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/huang2026iclr-mobileipl/}
}