TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs

Cite

Text

Zhang et al. "TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Zhang et al. "TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/zhang2025cvprw-tokenfocusvqa/)

BibTeX

@inproceedings{zhang2025cvprw-tokenfocusvqa,
  title     = {{TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs}},
  author    = {Zhang, Zijian and Zheng, Xuhui and Wu, Xuecheng and Peng, Chong and Cao, Xuezhi},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {1279-1288},
  url       = {https://mlanthology.org/cvprw/2025/zhang2025cvprw-tokenfocusvqa/}
}