Merlin: Empowering Multimodal LLMs with Foresight Minds
Abstract
Humans can foresee the future based on present observations, a skill we term as foresight minds. However, this capability remains under-explored within existing MLLMs, hindering their capacity to understand intentions behind subjects. To address this, we integrate the future modeling into MLLMs. By utilizing the trajectory, a highly structured representation, as a learning objective, we aim to equip the model to understand spatiotemporal dynamics. Inspired by the learning paradigm of LLMs, we first propose Foresight Pre-Training (FPT) that jointly learns various tasks centered on trajectories, enabling MLLMs to predict entire trajectories from a given initial observation. Then, we propose Foresight Instruction-Tuning (FIT) that requires MLLMs to reason about potential future events based on predicted trajectories. Aided by FPT and FIT, we build an unified MLLM named Merlin that supports complex future reasoning. Experiments show Merlin’s foresight minds with impressive performance on both future reasoning and visual comprehension tasks. Project page: https: //ahnsun.github.io/merlin.
Cite
Text
Yu et al. "Merlin: Empowering Multimodal LLMs with Foresight Minds." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73235-5_24Markdown
[Yu et al. "Merlin: Empowering Multimodal LLMs with Foresight Minds." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/yu2024eccv-merlin/) doi:10.1007/978-3-031-73235-5_24BibTeX
@inproceedings{yu2024eccv-merlin,
title = {{Merlin: Empowering Multimodal LLMs with Foresight Minds}},
author = {Yu, En and Zhao, Liang and Wei, Yana and Yang, Jinrong and Wu, Dongming and Kong, Lingyu and Wei, Haoran and Wang, Tiancai and Ge, Zheng and Zhang, Xiangyu and Tao, Wenbing},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-73235-5_24},
url = {https://mlanthology.org/eccv/2024/yu2024eccv-merlin/}
}