Differentially Private No-Regret Exploration in Adversarial Markov Decision Processes

Abstract

We study learning adversarial Markov decision process (MDP) in the episodic setting under the constraint of differential privacy (DP). This is motivated by the widespread applications of reinforcement learning (RL) in non-stationary and even adversarial scenarios, where protecting users’ sensitive information is vital. We first propose two efficient frameworks for adversarial MDPs, spanning full-information and bandit settings. Within each framework, we consider both Joint DP (JDP), where a central agent is trusted to protect the sensitive data, and Local DP (LDP), where the information is protected directly on the user side. Then, we design novel privacy mechanisms to privatize the stochastic transition and adversarial losses. By instantiating such privacy mechanisms to satisfy JDP and LDP requirements, we obtain near-optimal regret guarantees for both frameworks. To our knowledge, these are the first algorithms to tackle the challenge of private learning in adversarial MDPs.

Cite

Text

Bai et al. "Differentially Private No-Regret Exploration in Adversarial Markov Decision Processes." Uncertainty in Artificial Intelligence, 2024.

Markdown

[Bai et al. "Differentially Private No-Regret Exploration in Adversarial Markov Decision Processes." Uncertainty in Artificial Intelligence, 2024.](https://mlanthology.org/uai/2024/bai2024uai-differentially/)

BibTeX

@inproceedings{bai2024uai-differentially,
  title     = {{Differentially Private No-Regret Exploration in Adversarial Markov Decision Processes}},
  author    = {Bai, Shaojie and Zeng, Lanting and Zhao, Chengcheng and Duan, Xiaoming and Sadegh Talebi, Mohammad and Cheng, Peng and Chen, Jiming},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2024},
  pages     = {235-272},
  volume    = {244},
  url       = {https://mlanthology.org/uai/2024/bai2024uai-differentially/}
}