Class-Agnostic Object Counting with Text-to-Image Diffusion Model

Abstract

Class-agnostic object counting aims to count objects of arbitrary classes with limited information (, a few exemplars or the class names) provided. It requires the model to effectively acquire the characteristics of the target objects and accurately perform counting, which can be challenging. In this work, inspired by that text-to-image diffusion models hold rich knowledge and comprehensive understanding of real-world objects, we propose to leverage the pre-trained text-to-image diffusion model to facilitate class-agnostic object counting. Specifically, we propose a novel framework named CountDiff with careful designs, leveraging the pre-trained diffusion model’s comprehensive understanding of image contents to perform class-agnostic object counting. The experiments show the effectiveness of CountDiff on both few-shot setting with exemplars provided and zero-shot setting with class names provided.

Cite

Text

Hui et al. "Class-Agnostic Object Counting with Text-to-Image Diffusion Model." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72890-7_1

Markdown

[Hui et al. "Class-Agnostic Object Counting with Text-to-Image Diffusion Model." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/hui2024eccv-classagnostic/) doi:10.1007/978-3-031-72890-7_1

BibTeX

@inproceedings{hui2024eccv-classagnostic,
  title     = {{Class-Agnostic Object Counting with Text-to-Image Diffusion Model}},
  author    = {Hui, Xiaofei and Wu, Qian and Rahmani, Hossein and Liu, Jun},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72890-7_1},
  url       = {https://mlanthology.org/eccv/2024/hui2024eccv-classagnostic/}
}