NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching

Abstract

One of the most important tasks in ride-hailing is order dispatching, i.e., assigning unserved orders to available drivers. Recent order dispatching has achieved a significant improvement due to the advance of reinforcement learning, which has been approved to be able to effectively address sequential decision-making problems like order dispatching. However, most existing reinforcement learning methods require agents to learn the optimal policy by interacting with environments online, which is challenging or impractical for real-world deployment due to high costs or safety concerns. For example, due to the spatiotemporally unbalanced supply and demand, online reinforcement learning-based order dispatching may significantly impact the revenue of the ride-hailing platform and passenger experience during the policy learning period. Hence, in this work, we develop an offline deep reinforcement learning framework called NondBREM for large-scale order dispatching, which learns policy from only the accumulated logged data to avoid costly and unsafe interactions with the environment. In NondBREM, a Nondeterministic Batch-Constrained Q-learning (NondBCQ) module is developed to reduce the algorithm extrapolation error and a Random Ensemble Mixture (REM) module that integrates multiple value networks with multi-head networks is utilized to improve the model generalization and robustness. Extensive experiments on large-scale real-world ride-hailing datasets show the superiority of our design.

Cite

Text

Zhang et al. "NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I1.27794

Markdown

[Zhang et al. "NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/zhang2024aaai-nondbrem/) doi:10.1609/AAAI.V38I1.27794

BibTeX

@inproceedings{zhang2024aaai-nondbrem,
  title     = {{NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching}},
  author    = {Zhang, Hongbo and Wang, Guang and Wang, Xu and Zhou, Zhengyang and Zhang, Chen and Dong, Zheng and Wang, Yang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {401-409},
  doi       = {10.1609/AAAI.V38I1.27794},
  url       = {https://mlanthology.org/aaai/2024/zhang2024aaai-nondbrem/}
}