Multi-Agent Learning from Learners

Abstract

A large body of the "Inverse Reinforcement Learning" (IRL) literature focuses on recovering the reward function from a set of demonstrations of an expert agent who acts optimally or noisily optimally. Nevertheless, some recent works move away from the optimality assumption to study the "Learning from a Learner (LfL)" problem, where the challenge is inferring the reward function of a learning agent from a sequence of demonstrations produced by progressively improving policies. In this work, we take one of the initial steps in addressing the multi-agent version of this problem and propose a new algorithm, MA-LfL (Multiagent Learning from a Learner). Unlike the state-of-the-art literature, which recovers the reward functions from trajectories produced by agents in some equilibrium, we study the problem of inferring the reward functions of interacting agents in a general sum stochastic game without assuming any equilibrium state. The MA-LfL algorithm is rigorously built on a theoretical result that ensures its validity in the case of agents learning according to a multi-agent soft policy iteration scheme. We empirically test MA-LfL and we observe high positive correlation between the recovered reward functions and the ground truth.

Cite

Text

Caliskan et al. "Multi-Agent Learning from Learners." International Conference on Machine Learning, 2023.

Markdown

[Caliskan et al. "Multi-Agent Learning from Learners." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/caliskan2023icml-multiagent/)

BibTeX

@inproceedings{caliskan2023icml-multiagent,
  title     = {{Multi-Agent Learning from Learners}},
  author    = {Caliskan, Mine Melodi and Chini, Francesco and Maghsudi, Setareh},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {3525-3540},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/caliskan2023icml-multiagent/}
}