Primal-Dual Spectral Representation for Off-Policy Evaluation
Abstract
Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement learning (RL) to estimate the expected long-term payoff of a given target policy with \emph{only} experiences from another behavior policy that is potentially unknown. The distribution correction estimation (DICE) family of estimators have advanced the state of the art in OPE by breaking the \emph{curse of horizon}. However, the major bottleneck of applying DICE estimators lies in the difficulty of solving the saddle-point optimization involved, especially with neural network implementations. In this paper, we tackle this challenge by establishing a \emph{linear representation} of value function and stationary distribution correction ratio, \emph{i.e.}, primal and dual variables in the DICE framework, using the spectral decomposition of the transition operator. Such primal-dual representation not only bypasses the non-convex non-concave optimization in vanilla DICE, therefore enabling an computational efficient algorithm, but also paves the way for more efficient utilization of historical data. We highlight that our algorithm, \textbf{SpectralDICE}, is the first to leverage the linear representation of primal-dual variables that is both computation and sample efficient, the performance of which is supported by a rigorous theoretical sample complexity guarantee and a thorough empirical evaluation on various benchmarks.
Cite
Text
Hu et al. "Primal-Dual Spectral Representation for Off-Policy Evaluation." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.Markdown
[Hu et al. "Primal-Dual Spectral Representation for Off-Policy Evaluation." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/hu2025aistats-primaldual/)BibTeX
@inproceedings{hu2025aistats-primaldual,
title = {{Primal-Dual Spectral Representation for Off-Policy Evaluation}},
author = {Hu, Yang and Chen, Tianyi and Li, Na and Wang, Kai and Dai, Bo},
booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
year = {2025},
pages = {3808-3816},
volume = {258},
url = {https://mlanthology.org/aistats/2025/hu2025aistats-primaldual/}
}