NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching
Abstract
One of the most important tasks in ride-hailing is order dispatching, i.e., assigning unserved orders to available drivers. Recent order dispatching has achieved a significant improvement due to the advance of reinforcement learning, which has been approved to be able to effectively address sequential decision-making problems like order dispatching. However, most existing reinforcement learning methods require agents to learn the optimal policy by interacting with environments online, which is challenging or impractical for real-world deployment due to high costs or safety concerns. For example, due to the spatiotemporally unbalanced supply and demand, online reinforcement learning-based order dispatching may significantly impact the revenue of the ride-hailing platform and passenger experience during the policy learning period. Hence, in this work, we develop an offline deep reinforcement learning framework called NondBREM for large-scale order dispatching, which learns policy from only the accumulated logged data to avoid costly and unsafe interactions with the environment. In NondBREM, a Nondeterministic Batch-Constrained Q-learning (NondBCQ) module is developed to reduce the algorithm extrapolation error and a Random Ensemble Mixture (REM) module that integrates multiple value networks with multi-head networks is utilized to improve the model generalization and robustness. Extensive experiments on large-scale real-world ride-hailing datasets show the superiority of our design.
Cite
Text
Zhang et al. "NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I1.27794Markdown
[Zhang et al. "NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/zhang2024aaai-nondbrem/) doi:10.1609/AAAI.V38I1.27794BibTeX
@inproceedings{zhang2024aaai-nondbrem,
title = {{NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching}},
author = {Zhang, Hongbo and Wang, Guang and Wang, Xu and Zhou, Zhengyang and Zhang, Chen and Dong, Zheng and Wang, Yang},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2024},
pages = {401-409},
doi = {10.1609/AAAI.V38I1.27794},
url = {https://mlanthology.org/aaai/2024/zhang2024aaai-nondbrem/}
}