Classifier Guided Cluster Density Reduction for Dataset Selection

Cheng Chang, Keyu Long, Zijian Li, Himanshu Rai

CVPRW 2024 pp. 7338-7347

doi:10.1109/CVPRW63382.2024.00729 /cvprw/2024/chang2024cvprw-classifier/

Abstract

In this paper, we address the challenge of selecting an optimal dataset from a source pool with annotations to enhance performance on a target dataset derived from a different source. This is important in scenarios where it is hard to afford on-the-fly dataset annotation and is also the theme of the second Visual Data Understanding (VDU) Challenge. Our solution, the Classifier Guided Cluster Density Reduction (CCDR) framework, operates in two stages. Initially, we employ a filtering technique to identify images that align with the target dataset’s distribution. Subsequently, we implement a graph-based cluster density reduction method, steered by a classifier that approximates the distance between the target distribution and source distribution. This classifier is trained to distinguish between images that resemble the target dataset and those that do not, facilitating the pruning process shown in Figure 1. Our approach maintains a balance between selecting pertinent images that match the target distribution and eliminating redundant ones that do not contribute to the enhancement of the detection model. We demonstrate the superiority of our method over various baselines in object detection tasks, particularly in optimizing the training set distribution on the region100 dataset. We have released our code here: https://github.com/himsR/DataCVChallenge-2024/tree/main

CVPRW Semantic Scholar

Cite

Text

Chang et al. "Classifier Guided Cluster Density Reduction for Dataset Selection." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00729

Markdown

[Chang et al. "Classifier Guided Cluster Density Reduction for Dataset Selection." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/chang2024cvprw-classifier/) doi:10.1109/CVPRW63382.2024.00729

BibTeX

@inproceedings{chang2024cvprw-classifier,
  title     = {{Classifier Guided Cluster Density Reduction for Dataset Selection}},
  author    = {Chang, Cheng and Long, Keyu and Li, Zijian and Rai, Himanshu},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {7338-7347},
  doi       = {10.1109/CVPRW63382.2024.00729},
  url       = {https://mlanthology.org/cvprw/2024/chang2024cvprw-classifier/}
}