UniToken: Harmonizing Multimodal Understanding and Generation Through Unified Visual Encoding

Cite

Text

Jiao et al. "UniToken: Harmonizing Multimodal Understanding and Generation Through Unified Visual Encoding." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Jiao et al. "UniToken: Harmonizing Multimodal Understanding and Generation Through Unified Visual Encoding." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/jiao2025cvprw-unitoken/)

BibTeX

@inproceedings{jiao2025cvprw-unitoken,
  title     = {{UniToken: Harmonizing Multimodal Understanding and Generation Through Unified Visual Encoding}},
  author    = {Jiao, Yang and Qiu, Haibo and Jie, Zequn and Chen, Shaoxiang and Chen, Jingjing and Ma, Lin and Jiang, Yu-Gang},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {3600-3610},
  url       = {https://mlanthology.org/cvprw/2025/jiao2025cvprw-unitoken/}
}