A Bag-of-Prototypes Representation for Dataset-Level Applications

Weijie Tu, Weijian Deng, Tom Gedeon, Liang Zheng

CVPR 2023 pp. 2881-2892

doi:10.1109/CVPR52729.2023.00282 /cvpr/2023/tu2023cvpr-bagofprototypes/

Abstract

This work investigates dataset vectorization for two dataset-level tasks: assessing training set suitability and test set difficulty. The former measures how suitable a training set is for a target domain, while the latter studies how challenging a test set is for a learned model. Central of the two tasks is measuring the underlying relationship between datasets. This needs a desirable dataset vectorization scheme, which should preserve as much discriminative dataset information as possible so that the distance between the resulting dataset vectors can reflect dataset-to-dataset similarity. To this end, we propose a bag-of-prototypes (BoP) dataset representation that extends the image level bag consisting of patch descriptors to dataset-level bag consisting of semantic prototypes. Specifically, we develop a codebook consisting of K prototypes clustered from a reference dataset. Given a dataset to be encoded, we quantize each of its image features to a certain prototype in the codebook and obtain a K-dimensional histogram feature. Without assuming access to dataset labels, the BoP representation provides rich characterization of dataset semantic distribution. Further, BoP representations cooperates well with Jensen-Shannon divergence for measuring dataset-to-dataset similarity. Albeit very simple, BoP consistently shows its advantage over existing representations on a series of benchmarks for two dataset-level tasks.

PDF CVPR Semantic Scholar

Cite

Text

Tu et al. "A Bag-of-Prototypes Representation for Dataset-Level Applications." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00282

Markdown

[Tu et al. "A Bag-of-Prototypes Representation for Dataset-Level Applications." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/tu2023cvpr-bagofprototypes/) doi:10.1109/CVPR52729.2023.00282

BibTeX

@inproceedings{tu2023cvpr-bagofprototypes,
  title     = {{A Bag-of-Prototypes Representation for Dataset-Level Applications}},
  author    = {Tu, Weijie and Deng, Weijian and Gedeon, Tom and Zheng, Liang},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {2881-2892},
  doi       = {10.1109/CVPR52729.2023.00282},
  url       = {https://mlanthology.org/cvpr/2023/tu2023cvpr-bagofprototypes/}
}