MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning
Abstract
In visual Reinforcement Learning (RL) learning from pixel-based observations poses significant challenges on sample efficiency primarily due to the complexity of extracting informative state representations from high-dimensional data. Previous methods such as contrastive-based approaches have made strides in improving sample efficiency but fall short in modeling the nuanced evolution of states. To address this we introduce MOOSS a novel framework that leverages a temporal contrastive objective with the help of graph-based spatial-temporal masking to explicitly model state evolution in visual RL. Specifically we propose a self-supervised dual-component strategy that integrates (1) a graph construction of pixel-based observations for spatial-temporal masking coupled with (2) a multi-level contrastive learning mechanism that enriches state representations by emphasizing temporal continuity and change of states. MOOSS advances the understanding of state dynamics by disrupting and learning from spatial-temporal correlations which facilitates policy learning. Our comprehensive evaluation on multiple continuous and discrete control benchmarks shows that MOOSS outperforms previous state-of-the-art visual RL methods in terms of sample efficiency demonstrating the effectiveness of our method.
Cite
Text
Sun et al. "MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning." Winter Conference on Applications of Computer Vision, 2025.Markdown
[Sun et al. "MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning." Winter Conference on Applications of Computer Vision, 2025.](https://mlanthology.org/wacv/2025/sun2025wacv-mooss/)BibTeX
@inproceedings{sun2025wacv-mooss,
title = {{MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning}},
author = {Sun, Jiarui and Akcal, M. Ugur and Chowdhary, Girish and Zhang, Wei},
booktitle = {Winter Conference on Applications of Computer Vision},
year = {2025},
pages = {6719-6729},
url = {https://mlanthology.org/wacv/2025/sun2025wacv-mooss/}
}