ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control
Abstract
We consider deep deterministic policy gradient (DDPG) in the context of reinforcement learning with sparse rewards. To enhance exploration, we introduce a search procedure, \emph{${\epsilon}{t}$-greedy}, which generates exploratory options for exploring less-visited states. We prove that search using $\epsilon t$-greedy has polynomial sample complexity under mild MDP assumptions. To more efficiently use the information provided by rewarded transitions, we develop a new dual experience replay buffer framework, \emph{GDRB}, and implement \emph{longest n-step returns}. The resulting algorithm, \emph{ETGL-DDPG}, integrates all three techniques: \bm{$\epsilon t$}-greedy, \textbf{G}DRB, and \textbf{L}ongest $n$-step, into DDPG. We evaluate ETGL-DDPG on standard benchmarks and demonstrate that it outperforms DDPG, as well as other state-of-the-art methods, across all tested sparse-reward continuous environments. Ablation studies further highlight how each strategy individually enhances the performance of DDPG in this setting.
Cite
Text
Futuhi et al. "ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control." Transactions on Machine Learning Research, 2025.Markdown
[Futuhi et al. "ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/futuhi2025tmlr-etglddpg/)BibTeX
@article{futuhi2025tmlr-etglddpg,
title = {{ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control}},
author = {Futuhi, Ehsan and Karimi, Shayan and Gao, Chao and Müller, Martin},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/futuhi2025tmlr-etglddpg/}
}