R-WoM: Retrieval-Augmented World Model for Computer-Use Agents
Abstract
Large Language Models (LLMs) can serve as world models to enhance agent decision-making in digital environments by simulating future states and predicting action outcomes, potentially eliminating costly trial-and-error exploration. However, this capability is fundamentally limited by LLM's tendency to hallucination and their reliance on static training knowledge, which could lead to compounding errors that inhibit long-horizon simulations. To systematically investigate whether LLMs are appropriate for world modeling, we probe two core capabilities of world models -- *future state prediction* and *reward estimation* -- through three tasks: next-state identification, full-procedure planning alignment, and milestone transition recognition. Our analysis shows that while LLMs effectively capture immediate next states and identify meaningful state transitions, their performance rapidly degrades in full-procedure planning. This highlights LLMs’ limitations in reliably modeling environment dynamics over long horizons. To address these limitations, we propose the Retrieval-augmented World Model (R-WoM), which grounds LLM simulations by incorporating factual, up-to-date knowledge retrieved from external tutorials. Experiments show that R-WoM achieves relative improvements of up to 23.4\% and 16.3\% on the subsets of OSWorld and Webarena compared to baselines, with particular advantage in longer-horizon simulations.
Cite
Text
Mei et al. "R-WoM: Retrieval-Augmented World Model for Computer-Use Agents." International Conference on Learning Representations, 2026.Markdown
[Mei et al. "R-WoM: Retrieval-Augmented World Model for Computer-Use Agents." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/mei2026iclr-rwom/)BibTeX
@inproceedings{mei2026iclr-rwom,
title = {{R-WoM: Retrieval-Augmented World Model for Computer-Use Agents}},
author = {Mei, Kai and Guo, Jiang and Chang, Shuaichen and Dong, Mingwen and Lee, Dongkyu and Niu, Xing and Jiang, Jiarong},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/mei2026iclr-rwom/}
}