YOLO-World: Real-Time Open-Vocabulary Object Detection

Abstract

The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation we introduce YOLO-World an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. Specifically we propose a new Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) and region-text contrastive loss to facilitate the interaction between visual and linguistic information. Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency. On the challenging LVIS dataset YOLO-World achieves 35.4 AP with 52.0 FPS on V100 which outperforms many state-of-the-art methods in terms of both accuracy and speed. Furthermore the fine-tuned YOLO-World achieves remarkable performance on several downstream tasks including object detection and open-vocabulary instance segmentation. Code and models are available at https://github.com/AILab-CVC/YOLO-World

Cite

Text

Cheng et al. "YOLO-World: Real-Time Open-Vocabulary Object Detection." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01599

Markdown

[Cheng et al. "YOLO-World: Real-Time Open-Vocabulary Object Detection." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/cheng2024cvpr-yoloworld/) doi:10.1109/CVPR52733.2024.01599

BibTeX

@inproceedings{cheng2024cvpr-yoloworld,
  title     = {{YOLO-World: Real-Time Open-Vocabulary Object Detection}},
  author    = {Cheng, Tianheng and Song, Lin and Ge, Yixiao and Liu, Wenyu and Wang, Xinggang and Shan, Ying},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {16901-16911},
  doi       = {10.1109/CVPR52733.2024.01599},
  url       = {https://mlanthology.org/cvpr/2024/cheng2024cvpr-yoloworld/}
}