Marginal Benefit Driven RL Teacher for Unsupervised Environment Design
Abstract
Training generally capable agents in complex environments is a challenging task that involves identifying "right" environments at the training stage. Recent research has highlighted the potential of the Unsupervised Environment Design framework, which generates environment instances/levels adaptively at the frontier of the agent’s capabilities using regret measures. While regret approaches have shown great promise in generating feasible environments, they can produce difficult environments that are challenging for an RL agent to learn from. This is because regret represents the best-case (upper bound) learning potential and not the actual learning potential of an environment. To address this limitation, we propose an alternative mechanism that employs marginal benefit, focusing on the improvement (in terms of generalized performance) the agent policy gets for a given environment. The advantage of this new mechanism is that it is agent-focused (and not environment focused) and generates the "right" environments depending on the agent's policy. Additionally, to improve the generalizability of the agent, we introduce representative state diversity metric that aims to generate varied experiences for the agent. Finally, we provide detailed experimental results and ablation analysis to showcase the effectiveness of our new methods. We obtain SOTA results among RL based environment generation methods.
Cite
Text
Li et al. "Marginal Benefit Driven RL Teacher for Unsupervised Environment Design." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I17.34008Markdown
[Li et al. "Marginal Benefit Driven RL Teacher for Unsupervised Environment Design." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/li2025aaai-marginal/) doi:10.1609/AAAI.V39I17.34008BibTeX
@inproceedings{li2025aaai-marginal,
title = {{Marginal Benefit Driven RL Teacher for Unsupervised Environment Design}},
author = {Li, Dexun and Li, Wenjun and Varakantham, Pradeep},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {18253-18261},
doi = {10.1609/AAAI.V39I17.34008},
url = {https://mlanthology.org/aaai/2025/li2025aaai-marginal/}
}