A Bag-of-Prototypes Representation for Dataset-Level Applications
Abstract
This work investigates dataset vectorization for two dataset-level tasks: assessing training set suitability and test set difficulty. The former measures how suitable a training set is for a target domain, while the latter studies how challenging a test set is for a learned model. Central of the two tasks is measuring the underlying relationship between datasets. This needs a desirable dataset vectorization scheme, which should preserve as much discriminative dataset information as possible so that the distance between the resulting dataset vectors can reflect dataset-to-dataset similarity. To this end, we propose a bag-of-prototypes (BoP) dataset representation that extends the image level bag consisting of patch descriptors to dataset-level bag consisting of semantic prototypes. Specifically, we develop a codebook consisting of K prototypes clustered from a reference dataset. Given a dataset to be encoded, we quantize each of its image features to a certain prototype in the codebook and obtain a K-dimensional histogram feature. Without assuming access to dataset labels, the BoP representation provides rich characterization of dataset semantic distribution. Further, BoP representations cooperates well with Jensen-Shannon divergence for measuring dataset-to-dataset similarity. Albeit very simple, BoP consistently shows its advantage over existing representations on a series of benchmarks for two dataset-level tasks.
Cite
Text
Tu et al. "A Bag-of-Prototypes Representation for Dataset-Level Applications." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00282Markdown
[Tu et al. "A Bag-of-Prototypes Representation for Dataset-Level Applications." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/tu2023cvpr-bagofprototypes/) doi:10.1109/CVPR52729.2023.00282BibTeX
@inproceedings{tu2023cvpr-bagofprototypes,
title = {{A Bag-of-Prototypes Representation for Dataset-Level Applications}},
author = {Tu, Weijie and Deng, Weijian and Gedeon, Tom and Zheng, Liang},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2023},
pages = {2881-2892},
doi = {10.1109/CVPR52729.2023.00282},
url = {https://mlanthology.org/cvpr/2023/tu2023cvpr-bagofprototypes/}
}