R-WoM: Retrieval-Augmented World Model for Computer-Use Agents

Abstract

Large Language Models (LLMs) can serve as world models to enhance agent decision-making in digital environments by simulating future states and predicting action outcomes, potentially eliminating costly trial-and-error exploration. However, this capability is fundamentally limited by LLM's tendency to hallucination and their reliance on static training knowledge, which could lead to compounding errors that inhibit long-horizon simulations. To systematically investigate whether LLMs are appropriate for world modeling, we probe two core capabilities of world models -- *future state prediction* and *reward estimation* -- through three tasks: next-state identification, full-procedure planning alignment, and milestone transition recognition. Our analysis shows that while LLMs effectively capture immediate next states and identify meaningful state transitions, their performance rapidly degrades in full-procedure planning. This highlights LLMs’ limitations in reliably modeling environment dynamics over long horizons. To address these limitations, we propose the Retrieval-augmented World Model (R-WoM), which grounds LLM simulations by incorporating factual, up-to-date knowledge retrieved from external tutorials. Experiments show that R-WoM achieves relative improvements of up to 23.4\% and 16.3\% on the subsets of OSWorld and Webarena compared to baselines, with particular advantage in longer-horizon simulations.

Cite

Text

Mei et al. "R-WoM: Retrieval-Augmented World Model for Computer-Use Agents." International Conference on Learning Representations, 2026.

Markdown

[Mei et al. "R-WoM: Retrieval-Augmented World Model for Computer-Use Agents." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/mei2026iclr-rwom/)

BibTeX

@inproceedings{mei2026iclr-rwom,
  title     = {{R-WoM: Retrieval-Augmented World Model for Computer-Use Agents}},
  author    = {Mei, Kai and Guo, Jiang and Chang, Shuaichen and Dong, Mingwen and Lee, Dongkyu and Niu, Xing and Jiang, Jiarong},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/mei2026iclr-rwom/}
}