Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach
Abstract
A longstanding goal of reinforcement learning is to develop non- parametric representations of policies and value functions that support rapid learning without suffering from interference or the curse of di- mensionality. We have developed a trajectory-based approach, in which policies and value functions are represented nonparametrically along tra- jectories. These trajectories, policies, and value functions are updated as the value function becomes more accurate or as a model of the task is up- dated. We have applied this approach to periodic tasks such as hopping and walking, which required handling discount factors and discontinu- ities in the task dynamics, and using function approximation to represent value functions at discontinuities. We also describe extensions of the ap- proach to make the policies more robust to modeling error and sensor noise.
Cite
Text
Atkeson and Morimoto. "Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach." Neural Information Processing Systems, 2002.Markdown
[Atkeson and Morimoto. "Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach." Neural Information Processing Systems, 2002.](https://mlanthology.org/neurips/2002/atkeson2002neurips-nonparametric/)BibTeX
@inproceedings{atkeson2002neurips-nonparametric,
title = {{Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach}},
author = {Atkeson, Christopher G. and Morimoto, Jun},
booktitle = {Neural Information Processing Systems},
year = {2002},
pages = {1643-1650},
url = {https://mlanthology.org/neurips/2002/atkeson2002neurips-nonparametric/}
}