No Regret Reinforcement Learning Algorithms for Online Scheduling with Multi-Stage Tasks

Abstract

We study online task scheduling problems where tasks arrive sequentially and are processed by the platform or server. The service processes for tasks are multi-stage and are modeled as episodic Markov Decision Processes (MDPs). While processing a task, the system acquires rewards by consuming resources. The goal of the platform is to maximize the reward-to-cost ratio over a sequence of K tasks. Online scheduling with multi-stage tasks faces two major challenges: intra-dependence among the different stages within a task and inter-dependence among different tasks. These challenges are further exacerbated by the unknown rewards, costs, and task arrival distribution. To address these challenges, we propose the Robbins-Monro-based Value Iteration for Ratio Maximization (RM^2VI) algorithm. Specifically,RM^2VI addresses ``intra-dependence'' through optimistic value iteration and handles ``inter-dependence'' using the Robbins-Monro method. The algorithm has a greedy structure and achieves a sub-linear regret of O(K^(3/4)), establishing the no-regret property (per-task). We test RM^2VI in two synthetic experiments of sale promotion in E-commerce and machine learning job training in cloud computing. The results show RM^2VI achieves the best reward-to-cost ratio compared with the baselines.

Cite

Text

Xu et al. "No Regret Reinforcement Learning Algorithms for Online Scheduling with Multi-Stage Tasks." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/753

Markdown

[Xu et al. "No Regret Reinforcement Learning Algorithms for Online Scheduling with Multi-Stage Tasks." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/xu2025ijcai-regret/) doi:10.24963/IJCAI.2025/753

BibTeX

@inproceedings{xu2025ijcai-regret,
  title     = {{No Regret Reinforcement Learning Algorithms for Online Scheduling with Multi-Stage Tasks}},
  author    = {Xu, Yongxin and Guo, Hengquan and Shao, Ziyu and Liu, Xin},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {6767-6775},
  doi       = {10.24963/IJCAI.2025/753},
  url       = {https://mlanthology.org/ijcai/2025/xu2025ijcai-regret/}
}