LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Abstract
This paper presents (), a general-purpose multimodal assistant trained using an end-to-end approach that systematically expands the capabilities of large multimodal models (LMMs). maintains a skill repository that contains a wide range of vision and vision-language pre-trained models (tools), and is able to activate relevant tools, given users’ multimodal inputs, to compose their execution results on the fly to fulfill many real-world tasks. To acquire the ability of using tools, is trained on multimodal instruction-following data that we have curated. The training data covers many tool use examples of visual understanding, generation, external knowledge retrieval and their compositions. Empirical results show that outperforms LLaVA in existing capabilities, and exhibits many new capabilities. Compared with tool-augmented LLMs, is distinct in that the image query is directly grounded in and actively engaged throughout the entire human-AI interaction sessions, significantly improving tool use performance and enabling new scenarios.
Cite
Text
Liu et al. "LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72970-6_8Markdown
[Liu et al. "LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/liu2024eccv-llavaplus/) doi:10.1007/978-3-031-72970-6_8BibTeX
@inproceedings{liu2024eccv-llavaplus,
title = {{LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents}},
author = {Liu, Shilong and Cheng, Hao and Liu, Haotian and Zhang, Hao and Li, Feng and Ren, Tianhe and Zou, Xueyan and Yang, Jianwei and Su, Hang and Zhu, Jun and Zhang, Lei and Gao, Jianfeng and Li, Chunyuan},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-72970-6_8},
url = {https://mlanthology.org/eccv/2024/liu2024eccv-llavaplus/}
}