HypRL: Reinforcement Learning of Control Policies for Hyperproperties

Abstract

Reward shaping in multi-agent reinforcement learning (MARL) for complex tasks remains a significant challenge. Existing approaches often fail to find optimal solutions or cannot efficiently handle such tasks. We propose HypRL, a specification-guided reinforcement learning framework that learns control policies w.r.t. hyperproperties expressed in HyperLTL. Hyperproperties constitute a powerful formalism for specifying objectives and constraints over sets of execution traces across agents. To learn policies that maximize the satisfaction of a HyperLTL formula $\varphi$, we apply Skolemization to manage quantifier alternations and define quantitative robustness functions to shape rewards over execution traces of a Markov decision process with unknown transitions. A suitable RL algorithm is then used to learn policies that collectively maximize the expected reward and, consequently, increase the probability of satisfying $\varphi$. We evaluate HypRL on a diverse set of benchmarks, including safety-aware planning, Deep Sea Treasure, and the Post Correspondence Problem. We also compare with specification-driven baselines to demonstrate the effectiveness and efficiency of HypRL.

Cite

Text

Hsu et al. "HypRL: Reinforcement Learning of Control Policies for Hyperproperties." Advances in Neural Information Processing Systems, 2025.

Markdown

[Hsu et al. "HypRL: Reinforcement Learning of Control Policies for Hyperproperties." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/hsu2025neurips-hyprl/)

BibTeX

@inproceedings{hsu2025neurips-hyprl,
  title     = {{HypRL: Reinforcement Learning of Control Policies for Hyperproperties}},
  author    = {Hsu, Tzu-Han and Rafieioskouei, Arshia and Bonakdarpour, Borzoo},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/hsu2025neurips-hyprl/}
}