Localizing Search in Reinforcement Learning
Abstract
Reinforcement learning (RL) can be impractical for many high dimensional problems because of the com-putational cost of doing stochastic search in large state spaces. We propose a new RL method, Boundary Lo-calized Reinforcement Learning (BLRL), which maps RL into a mode switching problem where an agent de-terministically chooses an action based on its state, and limits stochastic search to small areas around mode boundaries, drastically reducing computational cost. BLRL starts with an initial set of parameterized bound-aries that partition the state space into distinct control modes. Reinforcement reward is used to update the boundary parameters using the policy gradient formu-lation of Sutton et al. (2000). We demonstrate that stochastic search can be limited to regions near mode boundaries, thus greatly reducing search, while still guaranteeing convergence to a locally optimal deter-ministic mode switching policy. Further, we give con-ditions under which the policy gradient can be arbitrar-ily well approximated without the use of any stochastic search. These theoretical results are supported experi-mentally via simulation.
Cite
Text
Grudic and Ungar. "Localizing Search in Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2000.Markdown
[Grudic and Ungar. "Localizing Search in Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2000.](https://mlanthology.org/aaai/2000/grudic2000aaai-localizing/)BibTeX
@inproceedings{grudic2000aaai-localizing,
title = {{Localizing Search in Reinforcement Learning}},
author = {Grudic, Gregory Z. and Ungar, Lyle H.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2000},
pages = {590-595},
url = {https://mlanthology.org/aaai/2000/grudic2000aaai-localizing/}
}