PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects

Li, Junyi; Wu, Junfeng; Zhao, Weizhi; Bai, Song; Bai, Xiang

doi:10.1007/978-3-031-73226-3_27

PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects

Junyi Li, Junfeng Wu, Weizhi Zhao, Song Bai, Xiang Bai

ECCV 2024

doi:10.1007/978-3-031-73226-3_27 /eccv/2024/li2024eccv-partglee/

Abstract

We present , a part-level foundation model for locating and identifying both objects and parts in images. Through a unified framework, accomplishes detection, segmentation, and grounding of instances at any granularity in the open world scenario. Specifically, we propose a Q-Former to construct the hierarchical relationship between objects and parts, parsing every object into corresponding semantic parts. By incorporating a large amount of object-level data, the hierarchical relationships can be extended, enabling to recognize a rich variety of parts. We conduct comprehensive studies to validate the effectiveness of our method, achieves the state-of-the-art performance across various part-level tasks and obtain competitive results on object-level tasks. The proposed significantly enhances hierarchical modeling capabilities and part-level perception over our previous GLEE model. Further analysis indicates that the hierarchical cognitive ability of is able to facilitate a detailed comprehension in images for mLLMs. The model and code will be released at https://provencestar.github.io/ PartGLEE-Vision/. all_papers.txt decode_tex_noligatures.sh decode_tex_noligatures.sh~ decode_tex.sh decode_tex.sh~ ECCV_abstracts.csv ECCV_abstracts_good.csv ECCV.csv ECCV.csv~ ECCV_new.csv generate_list.sh generate_list.sh~ generate_overview.sh gen.sh gen.sh~ HOWTO HOWTO~ pdflist pdflist.copied RCS snippet.html Equal Technical Contribution. Work done during Junfeng’s internship at ByteDance. to Xiang Bai . † Correspondence

PDF ECCV Semantic Scholar

Cite

Text

Li et al. "PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73226-3_27

Markdown

[Li et al. "PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/li2024eccv-partglee/) doi:10.1007/978-3-031-73226-3_27

BibTeX

@inproceedings{li2024eccv-partglee,
  title     = {{PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects}},
  author    = {Li, Junyi and Wu, Junfeng and Zhao, Weizhi and Bai, Song and Bai, Xiang},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73226-3_27},
  url       = {https://mlanthology.org/eccv/2024/li2024eccv-partglee/}
}