Combining Behaviors with the Successor Features Keyboard

Abstract

The Option Keyboard (OK) was recently proposed as a method for transferring behavioral knowledge across tasks. OK transfers knowledge by adaptively combining subsets of known behaviors using Successor Features (SFs) and Generalized Policy Improvement (GPI).However, it relies on hand-designed state-features and task encodings which are cumbersome to design for every new environment.In this work, we propose the "Successor Features Keyboard" (SFK), which enables transfer with discovered state-features and task encodings.To enable discovery, we propose the "Categorical Successor Feature Approximator" (CSFA), a novel learning algorithm for estimating SFs while jointly discovering state-features and task encodings.With SFK and CSFA, we achieve the first demonstration of transfer with SFs in a challenging 3D environment where all the necessary representations are discovered.We first compare CSFA against other methods for approximating SFs and show that only CSFA discovers representations compatible with SF&GPI at this scale.We then compare SFK against transfer learning baselines and show that it transfers most quickly to long-horizon tasks.

Cite

Text

Carvalho et al. "Combining Behaviors with the Successor Features Keyboard." Neural Information Processing Systems, 2023.

Markdown

[Carvalho et al. "Combining Behaviors with the Successor Features Keyboard." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/carvalho2023neurips-combining/)

BibTeX

@inproceedings{carvalho2023neurips-combining,
  title     = {{Combining Behaviors with the Successor Features Keyboard}},
  author    = {Carvalho, Wilka Carvalho and Saraiva, Andre and Filos, Angelos and Lampinen, Andrew and Matthey, Loic and Lewis, Richard L and Lee, Honglak and Singh, Satinder P. and Rezende, Danilo Jimenez and Zoran, Daniel},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/carvalho2023neurips-combining/}
}