Self-Predictive Universal AI
Abstract
Reinforcement Learning (RL) algorithms typically utilize learning and/or planning techniques to derive effective policies. The integration of both approaches has proven to be highly successful in addressing complex sequential decision-making challenges, as evidenced by algorithms such as AlphaZero and MuZero, which consolidate the planning process into a parametric search-policy. AIXI, the most potent theoretical universal agent, leverages planning through comprehensive search as its primary means to find an optimal policy. Here we define an alternative universal agent, which we call Self-AIXI, that on the contrary to AIXI, maximally exploits learning to obtain good policies. It does so by self-predicting its own stream of action data, which is generated, similarly to other TD(0) agents, by taking an action maximization step over the current on-policy (universal mixture-policy) Q-value estimates. We prove that Self-AIXI converges to AIXI, and inherits a series of properties like maximal Legg-Hutter intelligence and the self-optimizing property.
Cite
Text
Catt et al. "Self-Predictive Universal AI." Neural Information Processing Systems, 2023.Markdown
[Catt et al. "Self-Predictive Universal AI." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/catt2023neurips-selfpredictive/)BibTeX
@inproceedings{catt2023neurips-selfpredictive,
title = {{Self-Predictive Universal AI}},
author = {Catt, Elliot and Grau-Moya, Jordi and Hutter, Marcus and Aitchison, Matthew and Genewein, Tim and Delétang, Grégoire and Li, Kevin and Veness, Joel},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/catt2023neurips-selfpredictive/}
}