A Reinforcement Learning Approach to Job-Shop Scheduling

Abstract

We apply reinforcement learning methods to learn domain-specific heuristics for job shop scheduling. A repair-based scheduler starts with a critical-path schedule and incrementally repairs constraint violations with the goal of finding a short conflict-free schedule. The temporal difference algorithm TD() is applied to train a neural network to learn a heuristic evaluation function over states. This evaluation function is used by a one-step lookahead search procedure to find good solutions to new scheduling problems. We evaluate this approach on synthetic problems and on problems from a NASA space shuttle payload processing task. The evaluation function is trained on problems involving a small number of jobs and then tested on larger problems. The TD scheduler performs better than the best known existing algorithm for this task---Zweben's iterative repair method based on simulated annealing. The results suggest that reinforcement learning can provide a new method for constructing high-...

Cite

Text

Zhang and Dietterich. "A Reinforcement Learning Approach to Job-Shop Scheduling." International Joint Conference on Artificial Intelligence, 1995.

Markdown

[Zhang and Dietterich. "A Reinforcement Learning Approach to Job-Shop Scheduling." International Joint Conference on Artificial Intelligence, 1995.](https://mlanthology.org/ijcai/1995/zhang1995ijcai-reinforcement/)

BibTeX

@inproceedings{zhang1995ijcai-reinforcement,
  title     = {{A Reinforcement Learning Approach to Job-Shop Scheduling}},
  author    = {Zhang, Wei and Dietterich, Thomas G.},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {1995},
  pages     = {1114-1120},
  url       = {https://mlanthology.org/ijcai/1995/zhang1995ijcai-reinforcement/}
}