Timing as an Action: Learning When to Observe and Act

Abstract

In standard reinforcement learning setups, the agent receives observations and performs actions at evenly spaced intervals. However, in many real-world settings, observations are expensive, forcing agents to commit to courses of action for designated periods of time. Consider that doctors, after each visit, typically set not only a treatment plan but also a follow-up date at which that plan might be revised. In this work, we formalize the setup of timing-as-an-action. Through theoretical analysis in the tabular setting, we show that while the choice of delay intervals could be naively folded in as part of a composite action, these actions have a special structure and handling them intelligently yields statistical advantages. Taking a model-based perspective, these gains owe to the fact that delay actions do not add any parameters to the underlying model. For model estimation, we provide provable sample-efficiency improvements, and our experiments demonstrate empirical improvements in both healthcare simulators and classical reinforcement learning environments.

Cite

Text

Zhou et al. "Timing as an Action: Learning When to Observe and Act." Artificial Intelligence and Statistics, 2024.

Markdown

[Zhou et al. "Timing as an Action: Learning When to Observe and Act." Artificial Intelligence and Statistics, 2024.](https://mlanthology.org/aistats/2024/zhou2024aistats-timing/)

BibTeX

@inproceedings{zhou2024aistats-timing,
  title     = {{Timing as an Action: Learning When to Observe and Act}},
  author    = {Zhou, Helen and Huang, Audrey and Azizzadenesheli, Kamyar and Childers, David and Lipton, Zachary},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2024},
  pages     = {3979-3987},
  volume    = {238},
  url       = {https://mlanthology.org/aistats/2024/zhou2024aistats-timing/}
}