Parallel Vertex Diffusion for Unified Visual Grounding

Cheng, Zesen; Li, Kehan; Jin, Peng; Li, Siheng; Ji, Xiangyang; Yuan, Li; Liu, Chang; Chen, Jie

doi:10.1609/AAAI.V38I2.27896

Parallel Vertex Diffusion for Unified Visual Grounding

Zesen Cheng, Kehan Li, Peng Jin, Siheng Li, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen

AAAI 2024 pp. 1326-1334

doi:10.1609/AAAI.V38I2.27896 /aaai/2024/cheng2024aaai-parallel/

Abstract

Unified visual grounding (UVG) capitalizes on a wealth of task-related knowledge across various grounding tasks via one-shot training, which curtails retraining costs and task-specific architecture design efforts. Vertex generation-based UVG methods achieve this versatility by unified modeling object box and contour prediction and provide a text-powered interface to vast related multi-modal tasks, e.g., visual question answering and captioning. However, these methods typically generate vertexes sequentially through autoregression, which is prone to be trapped in error accumulation and heavy computation, especially for high-dimension sequence generation in complex scenarios. In this paper, we develop Parallel Vertex Diffusion (PVD) based on the parallelizability of diffusion models to accurately and efficiently generate vertexes in a parallel and scalable manner. Since the coordinates fluctuate greatly, it typically encounters slow convergence when training diffusion models without geometry constraints. Therefore, we consummate our PVD by two critical components, i.e., center anchor mechanism and angle summation loss, which serve to normalize coordinates and adopt a differentiable geometry descriptor from the point-in-polygon problem of computational geometry to constrain the overall difference of prediction and label vertexes. These innovative designs empower our PVD to demonstrate its superiority with state-of-the-art performance across various grounding tasks.

PDF AAAI Semantic Scholar

Cite

Text

Cheng et al. "Parallel Vertex Diffusion for Unified Visual Grounding." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I2.27896

Markdown

[Cheng et al. "Parallel Vertex Diffusion for Unified Visual Grounding." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/cheng2024aaai-parallel/) doi:10.1609/AAAI.V38I2.27896

BibTeX

@inproceedings{cheng2024aaai-parallel,
  title     = {{Parallel Vertex Diffusion for Unified Visual Grounding}},
  author    = {Cheng, Zesen and Li, Kehan and Jin, Peng and Li, Siheng and Ji, Xiangyang and Yuan, Li and Liu, Chang and Chen, Jie},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {1326-1334},
  doi       = {10.1609/AAAI.V38I2.27896},
  url       = {https://mlanthology.org/aaai/2024/cheng2024aaai-parallel/}
}