DynaMind: Reasoning over Abstract Video Dynamics for Embodied Decision-Making
Abstract
Integrating natural language instructions and visual perception with decision-making is a critical challenge for embodied agents. Existing methods often struggle to balance the conciseness of language commands with the richness of video content. To bridge the gap between modalities, we propose extracting key spatiotemporal patterns from video that capture visual saliency and temporal evolution, referred to as dynamic representation. Building on this, we introduce DynaMind, a framework that enhances decision-making through dynamic reasoning. Specifically, we design an adaptive FrameScorer to evaluate video frames based on semantic consistency and visual saliency, assigning each frame an importance score. These scores are used to filter redundant video content and synthesize compact dynamic representations. Leveraging these representations, we predict critical future dynamics and apply a dynamic-guided policy to generate coherent and context-aware actions. Extensive results demonstrate that DynaMind significantly outperforms the baselines across several simulation benchmarks and real-world scenarios.
Cite
Text
Wang et al. "DynaMind: Reasoning over Abstract Video Dynamics for Embodied Decision-Making." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Wang et al. "DynaMind: Reasoning over Abstract Video Dynamics for Embodied Decision-Making." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/wang2025icml-dynamind/)BibTeX
@inproceedings{wang2025icml-dynamind,
title = {{DynaMind: Reasoning over Abstract Video Dynamics for Embodied Decision-Making}},
author = {Wang, Ziru and Wang, Mengmeng and Dai, Jade and Ma, Teli and Qi, Guo-Jun and Liu, Yong and Dai, Guang and Wang, Jingdong},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {64586-64603},
volume = {267},
url = {https://mlanthology.org/icml/2025/wang2025icml-dynamind/}
}