InfODist: Online Distillation with Informative Rewards Improves Generalization in Curriculum Learning
Abstract
Curriculum learning (CL) is an essential part of human learning, just as reinforcement learning (RL) is. However, CL agents that are trained using RL with neural networks produce limited generalization to later tasks in the curriculum. We show that online distillation using learned informative rewards tackles this problem. Here, we consider a reward to be informative if it is positive when the agent makes progress towards the goal and negative otherwise. Thus, an informative reward allows an agent to learn immediately to avoid states which are irrelevant to the task. And, the value and policy networks do not utilize their limited capacity to fit targets for these irrelevant states. Consequently, this improves generalization to later tasks. Our contributions: First, we propose InfODist, an online distillation method that makes use of informative rewards to significantly improve generalization in CL. Second, we show that training with informative rewards ameliorates the capacity loss phenomenon that was previously attributed to non-stationarities during the training process. Third, we show that learning from task-irrelevant states explains the capacity loss and subsequent impaired generalization. In conclusion, our work is a crucial step toward scaling curriculum learning to complex real world tasks.
Cite
Text
Siripurapu et al. "InfODist: Online Distillation with Informative Rewards Improves Generalization in Curriculum Learning." NeurIPS 2022 Workshops: DeepRL, 2022.Markdown
[Siripurapu et al. "InfODist: Online Distillation with Informative Rewards Improves Generalization in Curriculum Learning." NeurIPS 2022 Workshops: DeepRL, 2022.](https://mlanthology.org/neuripsw/2022/siripurapu2022neuripsw-infodist/)BibTeX
@inproceedings{siripurapu2022neuripsw-infodist,
title = {{InfODist: Online Distillation with Informative Rewards Improves Generalization in Curriculum Learning}},
author = {Siripurapu, Rahul and Patil, Vihang Prakash and Schweighofer, Kajetan and Dinu, Marius-Constantin and Schmied, Thomas and Diez, Luis Eduardo Ferro and Holzleitner, Markus and Eghbal-zadeh, Hamid and Kopp, Michael K. and Hochreiter, Sepp},
booktitle = {NeurIPS 2022 Workshops: DeepRL},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/siripurapu2022neuripsw-infodist/}
}