ROIDICE: Offline Return on Investment Maximization for Efficient Decision Making
Abstract
In this paper, we propose a novel policy optimization framework that maximizes Return on Investment (ROI) of a policy using a fixed dataset within a Markov Decision Process (MDP) equipped with a cost function. ROI, defined as the ratio between the return and the accumulated cost of a policy, serves as a measure of efficiency of the policy. Despite the importance of maximizing ROI in various applications, it remains a challenging problem due to its nature as a ratio of two long-term values: return and accumulated cost. To address this, we formulate the ROI maximizing reinforcement learning problem as a linear fractional programming. We then incorporate the stationary distribution correction (DICE) framework to develop a practical offline ROI maximization algorithm.Our proposed algorithm, ROIDICE, yields an efficient policy that offers a superior trade-off between return and accumulated cost compared to policies trained using existing frameworks.
Cite
Text
Kim et al. "ROIDICE: Offline Return on Investment Maximization for Efficient Decision Making." Neural Information Processing Systems, 2024. doi:10.52202/079017-0407Markdown
[Kim et al. "ROIDICE: Offline Return on Investment Maximization for Efficient Decision Making." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/kim2024neurips-roidice/) doi:10.52202/079017-0407BibTeX
@inproceedings{kim2024neurips-roidice,
title = {{ROIDICE: Offline Return on Investment Maximization for Efficient Decision Making}},
author = {Kim, Woosung and Lee, Hayeong and Lee, Jongmin and Lee, Byung-Jun},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-0407},
url = {https://mlanthology.org/neurips/2024/kim2024neurips-roidice/}
}