Rating-Based Reinforcement Learning

White, Devin; Wu, Mingkang; Novoseller, Ellen R.; Lawhern, Vernon J.; Waytowich, Nicholas R.; Cao, Yongcan

doi:10.1609/AAAI.V38I9.28886

Rating-Based Reinforcement Learning

Devin White, Mingkang Wu, Ellen R. Novoseller, Vernon J. Lawhern, Nicholas R. Waytowich, Yongcan Cao

AAAI 2024 pp. 10207-10215

doi:10.1609/AAAI.V38I9.28886 /aaai/2024/white2024aaai-rating/

Abstract

This paper develops a novel rating-based reinforcement learning approach that uses human ratings to obtain human guidance in reinforcement learning. Different from the existing preference-based and ranking-based reinforcement learning paradigms, based on human relative preferences over sample pairs, the proposed rating-based reinforcement learning approach is based on human evaluation of individual trajectories without relative comparisons between sample pairs. The rating-based reinforcement learning approach builds on a new prediction model for human ratings and a novel multi-class loss function. We conduct several experimental studies based on synthetic ratings and real human ratings to evaluate the effectiveness and benefits of the new rating-based reinforcement learning approach.

PDF AAAI Semantic Scholar

Cite

Text

White et al. "Rating-Based Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I9.28886

Markdown

[White et al. "Rating-Based Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/white2024aaai-rating/) doi:10.1609/AAAI.V38I9.28886

BibTeX

@inproceedings{white2024aaai-rating,
  title     = {{Rating-Based Reinforcement Learning}},
  author    = {White, Devin and Wu, Mingkang and Novoseller, Ellen R. and Lawhern, Vernon J. and Waytowich, Nicholas R. and Cao, Yongcan},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {10207-10215},
  doi       = {10.1609/AAAI.V38I9.28886},
  url       = {https://mlanthology.org/aaai/2024/white2024aaai-rating/}
}