Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-Off
Abstract
A default assumption in reinforcement learning (RL) and optimal control is that observations arrive at discrete time points on a fixed clock cycle. Yet, many applications involve continuous-time systems where the time discretization, in principle, can be managed. The impact of time discretization on RL methods has not been fully characterized in existing theory, but a more detailed analysis of its effect could reveal opportunities for improving data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation for LQR systems and uncover a fundamental trade-off between approximation and statistical error in value estimation. Importantly, these two errors behave differently to time discretization, leading to an optimal choice of temporal resolution for a given data budget. These findings show that managing the temporal resolution can provably improve policy evaluation efficiency in LQR systems with finite data. Empirically, we demonstrate the trade-off in numerical simulations of LQR instances and standard RL benchmarks for non-linear continuous control.
Cite
Text
Zhang et al. "Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-Off." Neural Information Processing Systems, 2023.Markdown
[Zhang et al. "Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-Off." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/zhang2023neurips-managing/)BibTeX
@inproceedings{zhang2023neurips-managing,
title = {{Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-Off}},
author = {Zhang, Zichen and Kirschner, Johannes and Zhang, Junxi and Zanini, Francesco and Ayoub, Alex and Dehghan, Masood and Schuurmans, Dale},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/zhang2023neurips-managing/}
}