Learning to Collaborate with Unknown Agents in the Absence of Reward
Abstract
With the advancements of artificial intelligence (AI), emerging scenarios involving close collaboration between AI and other unknown agents are becoming increasingly common. This requires sometimes training AI agents to collaborate with unknown agents in the absence of a reward function -- which may be unavailable to the AI agents or even undefined by the unknown agents themselves -- thus posing news challenges to existing learning algorithms that often require knowing the shared reward. In this paper, we show that effective teaming with unknown agents can be achieved in the absence of a reward function, through actively modeling other unknown agents and reasoning about their latent rewards from available interaction/observation history. In particular, we propose a novel framework that leverages a kernel density Bayesian inverse learning method for active reward/goal inference and prove that multi-agent reinforcement learning guided by the inferred reward signals can converge to an optimal policy teaming with unknown agents. The result enables us to develop an adaptive policy update strategy, through the use of a family of pre-trained, goal-conditioned policies, further eliminating the need for online retraining. The proposed solution is evaluated using a wide range of diverse unknown agents of latent and even non-stationary reward. Our solution significantly increases the teaming performance between AI and unknown agents in the absence of reward.
Cite
Text
Zhang et al. "Learning to Collaborate with Unknown Agents in the Absence of Reward." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I13.33589Markdown
[Zhang et al. "Learning to Collaborate with Unknown Agents in the Absence of Reward." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhang2025aaai-learning-c/) doi:10.1609/AAAI.V39I13.33589BibTeX
@inproceedings{zhang2025aaai-learning-c,
title = {{Learning to Collaborate with Unknown Agents in the Absence of Reward}},
author = {Zhang, Zuyuan and Zhou, Hanhan and Imani, Mahdi and Lee, Taeyoung and Lan, Tian},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {14502-14511},
doi = {10.1609/AAAI.V39I13.33589},
url = {https://mlanthology.org/aaai/2025/zhang2025aaai-learning-c/}
}