Using Free Energies to Represent Q-Values in a Multiagent Reinforcement Learning Task

Abstract

The problem of reinforcement learning in large factored Markov decision processes is explored. The Q-value of a state-action pair is approximated by the free energy of a product of experts network. Network parameters are learned on-line using a modified SARSA algorithm which minimizes the inconsistency of the Q-values of consecutive state-action pairs. Ac(cid:173) tions are chosen based on the current value estimates by fixing the current state and sampling actions from the network using Gibbs sampling. The algorithm is tested on a co-operative multi-agent task. The product of experts model is found to perform comparably to table-based Q-Iearning for small instances of the task, and continues to perform well when the problem becomes too large for a table-based representation.

Cite

Text

Sallans and Hinton. "Using Free Energies to Represent Q-Values in a Multiagent Reinforcement Learning Task." Neural Information Processing Systems, 2000.

Markdown

[Sallans and Hinton. "Using Free Energies to Represent Q-Values in a Multiagent Reinforcement Learning Task." Neural Information Processing Systems, 2000.](https://mlanthology.org/neurips/2000/sallans2000neurips-using/)

BibTeX

@inproceedings{sallans2000neurips-using,
  title     = {{Using Free Energies to Represent Q-Values in a Multiagent Reinforcement Learning Task}},
  author    = {Sallans, Brian and Hinton, Geoffrey E.},
  booktitle = {Neural Information Processing Systems},
  year      = {2000},
  pages     = {1075-1081},
  url       = {https://mlanthology.org/neurips/2000/sallans2000neurips-using/}
}