Overcoming Multi-Step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner

Chunhui Zhang, Zhongyu Ouyang, Kwonjoon Lee, Nakul Agarwal, Sean Dae Houlihan, Soroush Vosoughi, Shao-Yuan Lo

ICML 2025 pp. 75878-75900

/icml/2025/zhang2025icml-overcoming/

Abstract

Theory-of-mind (ToM) enables humans to infer mental states—such as beliefs, desires, and intentions—forming the foundation of social cognition. Existing computational ToM methods rely on structured workflows with ToM-specific priors or deep model fine-tuning but struggle with scalability in multimodal environments. They remain trapped within the gravitational pull of multi-step planning complexity, failing to generalize as task demands increase. To overcome these limitations, we propose a scalable Bayesian ToM planner. It breaks down ToM complexity into stepwise Bayesian updates. Meanwhile, weak-to-strong control specializes smaller LMs to refine ToM-specific likelihood estimation, transferring their ToM reasoning behavior to larger LMs (7B to 405B) for social and world knowledge integration. This synergistic approach enables scalability, aligning large-model inference with human mental states with Bayesian principles. Extensive experiments demonstrate a 4.6% improvement in accuracy over state-of-the-art methods on multimodal ToM benchmarks, including unseen scenarios, establishing a new standard for modeling human mental states in complex environments.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Zhang et al. "Overcoming Multi-Step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Zhang et al. "Overcoming Multi-Step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/zhang2025icml-overcoming/)

BibTeX

@inproceedings{zhang2025icml-overcoming,
  title     = {{Overcoming Multi-Step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner}},
  author    = {Zhang, Chunhui and Ouyang, Zhongyu and Lee, Kwonjoon and Agarwal, Nakul and Houlihan, Sean Dae and Vosoughi, Soroush and Lo, Shao-Yuan},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {75878-75900},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/zhang2025icml-overcoming/}
}