Timing as an Action: Learning When to Observe and Act
Abstract
In standard reinforcement learning setups, the agent receives observations and performs actions at evenly spaced intervals. However, in many real-world settings, observations are expensive, forcing agents to commit to courses of action for designated periods of time. Consider that doctors, after each visit, typically set not only a treatment plan but also a follow-up date at which that plan might be revised. In this work, we formalize the setup of timing-as-an-action. Through theoretical analysis in the tabular setting, we show that while the choice of delay intervals could be naively folded in as part of a composite action, these actions have a special structure and handling them intelligently yields statistical advantages. Taking a model-based perspective, these gains owe to the fact that delay actions do not add any parameters to the underlying model. For model estimation, we provide provable sample-efficiency improvements, and our experiments demonstrate empirical improvements in both healthcare simulators and classical reinforcement learning environments.
Cite
Text
Zhou et al. "Timing as an Action: Learning When to Observe and Act." Artificial Intelligence and Statistics, 2024.Markdown
[Zhou et al. "Timing as an Action: Learning When to Observe and Act." Artificial Intelligence and Statistics, 2024.](https://mlanthology.org/aistats/2024/zhou2024aistats-timing/)BibTeX
@inproceedings{zhou2024aistats-timing,
title = {{Timing as an Action: Learning When to Observe and Act}},
author = {Zhou, Helen and Huang, Audrey and Azizzadenesheli, Kamyar and Childers, David and Lipton, Zachary},
booktitle = {Artificial Intelligence and Statistics},
year = {2024},
pages = {3979-3987},
volume = {238},
url = {https://mlanthology.org/aistats/2024/zhou2024aistats-timing/}
}