Overcoming Multi-Step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner
Abstract
Theory-of-mind (ToM) enables humans to infer mental states—such as beliefs, desires, and intentions—forming the foundation of social cognition. Existing computational ToM methods rely on structured workflows with ToM-specific priors or deep model fine-tuning but struggle with scalability in multimodal environments. They remain trapped within the gravitational pull of multi-step planning complexity, failing to generalize as task demands increase. To overcome these limitations, we propose a scalable Bayesian ToM planner. It breaks down ToM complexity into stepwise Bayesian updates. Meanwhile, weak-to-strong control specializes smaller LMs to refine ToM-specific likelihood estimation, transferring their ToM reasoning behavior to larger LMs (7B to 405B) for social and world knowledge integration. This synergistic approach enables scalability, aligning large-model inference with human mental states with Bayesian principles. Extensive experiments demonstrate a 4.6% improvement in accuracy over state-of-the-art methods on multimodal ToM benchmarks, including unseen scenarios, establishing a new standard for modeling human mental states in complex environments.
Cite
Text
Zhang et al. "Overcoming Multi-Step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Zhang et al. "Overcoming Multi-Step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/zhang2025icml-overcoming/)BibTeX
@inproceedings{zhang2025icml-overcoming,
title = {{Overcoming Multi-Step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner}},
author = {Zhang, Chunhui and Ouyang, Zhongyu and Lee, Kwonjoon and Agarwal, Nakul and Houlihan, Sean Dae and Vosoughi, Soroush and Lo, Shao-Yuan},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {75878-75900},
volume = {267},
url = {https://mlanthology.org/icml/2025/zhang2025icml-overcoming/}
}