HSVI-Based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

Yan, Rui; Santos, Gabriel; Norman, Gethin; Parker, David; Kwiatkowska, Marta

HSVI-Based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska

L4DC 2024 pp. 80-91

/l4dc/2024/yan2024l4dc-hsvibased/

Abstract

We consider a variant of continuous-state partially-observable stochastic games with neural perception mechanisms and an asymmetric information structure. One agent has partial information, with the observation function implemented as a neural network, while the other agent is assumed to have full knowledge of the state. We present, for the first time, an efficient online method to compute an $\varepsilon$-minimax strategy profile, which requires only one linear program to be solved for each agent at every stage, instead of a complex estimation of opponent counterfactual values. For the partially-informed agent, we propose a continual resolving approach which uses lower bounds, pre-computed offline with heuristic search value iteration (HSVI), instead of opponent counterfactual values. This inherits the soundness of continual resolving at the cost of pre-computing the bound. For the fully-informed agent, we propose an inferred-belief strategy, where the agent maintains an inferred belief about the belief of the partially-informed agent based on (offline) upper bounds from HSVI, guaranteeing $\varepsilon$-distance to the value of the game at the initial belief known to both agents.

PDF L4DC Semantic Scholar

Cite

Text

Yan et al. "HSVI-Based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms." Proceedings of the 6th Annual Learning for Dynamics & Control Conference, 2024.

Markdown

[Yan et al. "HSVI-Based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms." Proceedings of the 6th Annual Learning for Dynamics & Control Conference, 2024.](https://mlanthology.org/l4dc/2024/yan2024l4dc-hsvibased/)

BibTeX

@inproceedings{yan2024l4dc-hsvibased,
  title     = {{HSVI-Based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms}},
  author    = {Yan, Rui and Santos, Gabriel and Norman, Gethin and Parker, David and Kwiatkowska, Marta},
  booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference},
  year      = {2024},
  pages     = {80-91},
  volume    = {242},
  url       = {https://mlanthology.org/l4dc/2024/yan2024l4dc-hsvibased/}
}