Multi-Objective Model-Based Policy Search for Data-Efficient Learning with Sparse Rewards

Rituraj Kaushik, Konstantinos I. Chatzilygeroudis, Jean-Baptiste Mouret

CoRL 2018 pp. 839-855

/corl/2018/kaushik2018corl-multi/

Abstract

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the expected return and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.

CoRL Semantic Scholar

Cite

Text

Kaushik et al. "Multi-Objective Model-Based Policy Search for Data-Efficient Learning with Sparse Rewards." Conference on Robot Learning, 2018.

Markdown

[Kaushik et al. "Multi-Objective Model-Based Policy Search for Data-Efficient Learning with Sparse Rewards." Conference on Robot Learning, 2018.](https://mlanthology.org/corl/2018/kaushik2018corl-multi/)

BibTeX

@inproceedings{kaushik2018corl-multi,
  title     = {{Multi-Objective Model-Based Policy Search for Data-Efficient Learning with Sparse Rewards}},
  author    = {Kaushik, Rituraj and Chatzilygeroudis, Konstantinos I. and Mouret, Jean-Baptiste},
  booktitle = {Conference on Robot Learning},
  year      = {2018},
  pages     = {839-855},
  url       = {https://mlanthology.org/corl/2018/kaushik2018corl-multi/}
}