Hu et al. "BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I3.27999
Markdown
[Hu et al. "BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/hu2024aaai-bliva/) doi:10.1609/AAAI.V38I3.27999
BibTeX
@inproceedings{hu2024aaai-bliva,
title = {{BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions}},
author = {Hu, Wenbo and Xu, Yifan and Li, Yi and Li, Weiyue and Chen, Zeyuan and Tu, Zhuowen},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2024},
pages = {2256-2264},
doi = {10.1609/AAAI.V38I3.27999},
url = {https://mlanthology.org/aaai/2024/hu2024aaai-bliva/}
}