Task-Aware World Model Learning with Meta Weighting via Bi-Level Optimization
Abstract
Aligning the world model with the environment for the agent’s specific task is crucial in model-based reinforcement learning. While value-equivalent models may achieve better task awareness than maximum-likelihood models, they sacrifice a large amount of semantic information and face implementation issues. To combine the benefits of both types of models, we propose Task-aware Environment Modeling Pipeline with bi-level Optimization (TEMPO), a bi-level model learning framework that introduces an additional level of optimization on top of a maximum-likelihood model by incorporating a meta weighter network that weights each training sample. The meta weighter in the upper level learns to generate novel sample weights by minimizing a proposed task-aware model loss. The model in the lower level focuses on important samples while maintaining rich semantic information in state representations. We evaluate TEMPO on a variety of continuous and discrete control tasks from the DeepMind Control Suite and Atari video games. Our results demonstrate that TEMPO achieves state-of-the-art performance regarding asymptotic performance, training stability, and convergence speed.
Cite
Text
Yuan et al. "Task-Aware World Model Learning with Meta Weighting via Bi-Level Optimization." Neural Information Processing Systems, 2023.Markdown
[Yuan et al. "Task-Aware World Model Learning with Meta Weighting via Bi-Level Optimization." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/yuan2023neurips-taskaware/)BibTeX
@inproceedings{yuan2023neurips-taskaware,
title = {{Task-Aware World Model Learning with Meta Weighting via Bi-Level Optimization}},
author = {Yuan, Huining and Dou, Hongkun and Jiang, Xingyu and Deng, Yue},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/yuan2023neurips-taskaware/}
}