Task-Aware World Model Learning with Meta Weighting via Bi-Level Optimization

Huining Yuan, Hongkun Dou, Xingyu Jiang, Yue Deng

NeurIPS 2023

/neurips/2023/yuan2023neurips-taskaware/

Abstract

Aligning the world model with the environment for the agent’s specific task is crucial in model-based reinforcement learning. While value-equivalent models may achieve better task awareness than maximum-likelihood models, they sacrifice a large amount of semantic information and face implementation issues. To combine the benefits of both types of models, we propose Task-aware Environment Modeling Pipeline with bi-level Optimization (TEMPO), a bi-level model learning framework that introduces an additional level of optimization on top of a maximum-likelihood model by incorporating a meta weighter network that weights each training sample. The meta weighter in the upper level learns to generate novel sample weights by minimizing a proposed task-aware model loss. The model in the lower level focuses on important samples while maintaining rich semantic information in state representations. We evaluate TEMPO on a variety of continuous and discrete control tasks from the DeepMind Control Suite and Atari video games. Our results demonstrate that TEMPO achieves state-of-the-art performance regarding asymptotic performance, training stability, and convergence speed.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Yuan et al. "Task-Aware World Model Learning with Meta Weighting via Bi-Level Optimization." Neural Information Processing Systems, 2023.

Markdown

[Yuan et al. "Task-Aware World Model Learning with Meta Weighting via Bi-Level Optimization." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/yuan2023neurips-taskaware/)

BibTeX

@inproceedings{yuan2023neurips-taskaware,
  title     = {{Task-Aware World Model Learning with Meta Weighting via Bi-Level Optimization}},
  author    = {Yuan, Huining and Dou, Hongkun and Jiang, Xingyu and Deng, Yue},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/yuan2023neurips-taskaware/}
}