Learning De-Biased Environment Models for Delivery Incentive Policy Optimization on Food Delivery Platforms

Liu, Yu-Ren; Chen, Xiong-Hui; Xiao, Siyuan; Yang, Xinyu; Qi, Xintong; Zhou, Linjun; Yu, Yang; Huang, Fangsheng

doi:10.1007/S10994-025-06846-6

Learning De-Biased Environment Models for Delivery Incentive Policy Optimization on Food Delivery Platforms

Yu-Ren Liu, Xiong-Hui Chen, Siyuan Xiao, Xinyu Yang, Xintong Qi, Linjun Zhou, Yang Yu, Fangsheng Huang

MLJ 2025 pp. 262

doi:10.1007/S10994-025-06846-6 /mlj/2025/liu2025mlj-learning/

Abstract

Accurate environmental modeling is essential for model-based offline policy optimization. Traditional empirical risk minimization can yield biased models if data collection is subject to selection bias, potentially misguiding policy optimization. This issue is especially pertinent in real-world decision-making scenarios where data collection often depends on optimized, non-random behavior policies. This paper addresses such a practical challenge of offline policy optimization under selection bias, with a focus on delivery incentive policies for food delivery platforms. We propose a novel framework for offline optimization of these policies, based on a de-biased environmental model. Initially, the framework learns a de-biased order acceptance rate and delivery time prediction model from historical data through adversarial weighted empirical risk minimization, constituting the environment model. Subsequently, it employs operation research solvers to derive historic best actions based on the learned de-biased environment model, determining the optimal bonus amount and reasonable incentive time limit for each order under budget constraints. Finally, a policy neural network is trained to map environmental states to these optimized actions, enabling efficient and executable policies for real-time decision-making. To verify the effectiveness and efficiency of our framework, both offline experiments on a real-world dataset and online A/B tests on the Meituan food delivery platform are conducted. Results demonstrate that our framework outperforms baseline methods in both model accuracy and policy optimization performance in offline experiments and realizes a 9% reduction in the customer complaint rate in reality.

PDF MLJ Semantic Scholar

Cite

Text

Liu et al. "Learning De-Biased Environment Models for Delivery Incentive Policy Optimization on Food Delivery Platforms." Machine Learning, 2025. doi:10.1007/S10994-025-06846-6

Markdown

[Liu et al. "Learning De-Biased Environment Models for Delivery Incentive Policy Optimization on Food Delivery Platforms." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/liu2025mlj-learning/) doi:10.1007/S10994-025-06846-6

BibTeX

@article{liu2025mlj-learning,
  title     = {{Learning De-Biased Environment Models for Delivery Incentive Policy Optimization on Food Delivery Platforms}},
  author    = {Liu, Yu-Ren and Chen, Xiong-Hui and Xiao, Siyuan and Yang, Xinyu and Qi, Xintong and Zhou, Linjun and Yu, Yang and Huang, Fangsheng},
  journal   = {Machine Learning},
  year      = {2025},
  pages     = {262},
  doi       = {10.1007/S10994-025-06846-6},
  volume    = {114},
  url       = {https://mlanthology.org/mlj/2025/liu2025mlj-learning/}
}