Merlin: Empowering Multimodal LLMs with Foresight Minds

Abstract

Humans can foresee the future based on present observations, a skill we term as foresight minds. However, this capability remains under-explored within existing MLLMs, hindering their capacity to understand intentions behind subjects. To address this, we integrate the future modeling into MLLMs. By utilizing the trajectory, a highly structured representation, as a learning objective, we aim to equip the model to understand spatiotemporal dynamics. Inspired by the learning paradigm of LLMs, we first propose Foresight Pre-Training (FPT) that jointly learns various tasks centered on trajectories, enabling MLLMs to predict entire trajectories from a given initial observation. Then, we propose Foresight Instruction-Tuning (FIT) that requires MLLMs to reason about potential future events based on predicted trajectories. Aided by FPT and FIT, we build an unified MLLM named Merlin that supports complex future reasoning. Experiments show Merlin’s foresight minds with impressive performance on both future reasoning and visual comprehension tasks. Project page: https: //ahnsun.github.io/merlin.

Cite

Text

Yu et al. "Merlin: Empowering Multimodal LLMs with Foresight Minds." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73235-5_24

Markdown

[Yu et al. "Merlin: Empowering Multimodal LLMs with Foresight Minds." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/yu2024eccv-merlin/) doi:10.1007/978-3-031-73235-5_24

BibTeX

@inproceedings{yu2024eccv-merlin,
  title     = {{Merlin: Empowering Multimodal LLMs with Foresight Minds}},
  author    = {Yu, En and Zhao, Liang and Wei, Yana and Yang, Jinrong and Wu, Dongming and Kong, Lingyu and Wei, Haoran and Wang, Tiancai and Ge, Zheng and Zhang, Xiangyu and Tao, Wenbing},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73235-5_24},
  url       = {https://mlanthology.org/eccv/2024/yu2024eccv-merlin/}
}