Reward-Relevance-Filtered Linear Offline Reinforcement Learning

Abstract

This paper studies offline reinforcement learning with linear function approximation in a setting with decision-theoretic, but not estimation sparsity. The structural restrictions of the data-generating process presume that the transitions factor into a sparse component that affects the reward and could affect additional exogenous dynamics that do not affect the reward. Although the minimally sufficient adjustment set for estimation of full-state transition properties depends on the whole state, the optimal policy and therefore state-action value function depends only on the sparse component: we call this causal/decision-theoretic sparsity. We develop a method for reward-filtering the estimation of the state-action value function to the sparse component by a modification of thresholded lasso in least-squares policy evaluation. We provide theoretical guarantees for our reward-filtered linear fitted-Q-iteration, with sample complexity depending only on the size of the sparse component.

Cite

Text

Zhou. "Reward-Relevance-Filtered Linear Offline Reinforcement Learning." Artificial Intelligence and Statistics, 2024.

Markdown

[Zhou. "Reward-Relevance-Filtered Linear Offline Reinforcement Learning." Artificial Intelligence and Statistics, 2024.](https://mlanthology.org/aistats/2024/zhou2024aistats-rewardrelevancefiltered/)

BibTeX

@inproceedings{zhou2024aistats-rewardrelevancefiltered,
  title     = {{Reward-Relevance-Filtered Linear Offline Reinforcement Learning}},
  author    = {Zhou, Angela},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2024},
  pages     = {3025-3033},
  volume    = {238},
  url       = {https://mlanthology.org/aistats/2024/zhou2024aistats-rewardrelevancefiltered/}
}