Continual Auxiliary Task Learning
Abstract
Learning auxiliary tasks, such as multiple predictions about the world, can provide many benefits to reinforcement learning systems. A variety of off-policy learning algorithms have been developed to learn such predictions, but as yet there is little work on how to adapt the behavior to gather useful data for those off-policy predictions. In this work, we investigate a reinforcement learning system designed to learn a collection of auxiliary tasks, with a behavior policy learning to take actions to improve those auxiliary predictions. We highlight the inherent non-stationarity in this continual auxiliary task learning problem, for both prediction learners and the behavior learner. We develop an algorithm based on successor features that facilitates tracking under non-stationary rewards, and prove the separation into learning successor features and rewards provides convergence rate improvements. We conduct an in-depth study into the resulting multi-prediction learning system.
Cite
Text
McLeod et al. "Continual Auxiliary Task Learning." Neural Information Processing Systems, 2021.Markdown
[McLeod et al. "Continual Auxiliary Task Learning." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/mcleod2021neurips-continual/)BibTeX
@inproceedings{mcleod2021neurips-continual,
title = {{Continual Auxiliary Task Learning}},
author = {McLeod, Matthew and Lo, Chunlok and Schlegel, Matthew and Jacobsen, Andrew and Kumaraswamy, Raksha and White, Martha and White, Adam},
booktitle = {Neural Information Processing Systems},
year = {2021},
url = {https://mlanthology.org/neurips/2021/mcleod2021neurips-continual/}
}