BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Cite

Text

Hu et al. "BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I3.27999

Markdown

[Hu et al. "BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/hu2024aaai-bliva/) doi:10.1609/AAAI.V38I3.27999

BibTeX

@inproceedings{hu2024aaai-bliva,
  title     = {{BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions}},
  author    = {Hu, Wenbo and Xu, Yifan and Li, Yi and Li, Weiyue and Chen, Zeyuan and Tu, Zhuowen},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {2256-2264},
  doi       = {10.1609/AAAI.V38I3.27999},
  url       = {https://mlanthology.org/aaai/2024/hu2024aaai-bliva/}
}