Classifier Guided Cluster Density Reduction for Dataset Selection
Abstract
In this paper, we address the challenge of selecting an optimal dataset from a source pool with annotations to enhance performance on a target dataset derived from a different source. This is important in scenarios where it is hard to afford on-the-fly dataset annotation and is also the theme of the second Visual Data Understanding (VDU) Challenge. Our solution, the Classifier Guided Cluster Density Reduction (CCDR) framework, operates in two stages. Initially, we employ a filtering technique to identify images that align with the target dataset’s distribution. Subsequently, we implement a graph-based cluster density reduction method, steered by a classifier that approximates the distance between the target distribution and source distribution. This classifier is trained to distinguish between images that resemble the target dataset and those that do not, facilitating the pruning process shown in Figure 1. Our approach maintains a balance between selecting pertinent images that match the target distribution and eliminating redundant ones that do not contribute to the enhancement of the detection model. We demonstrate the superiority of our method over various baselines in object detection tasks, particularly in optimizing the training set distribution on the region100 dataset. We have released our code here: https://github.com/himsR/DataCVChallenge-2024/tree/main
Cite
Text
Chang et al. "Classifier Guided Cluster Density Reduction for Dataset Selection." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00729Markdown
[Chang et al. "Classifier Guided Cluster Density Reduction for Dataset Selection." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/chang2024cvprw-classifier/) doi:10.1109/CVPRW63382.2024.00729BibTeX
@inproceedings{chang2024cvprw-classifier,
title = {{Classifier Guided Cluster Density Reduction for Dataset Selection}},
author = {Chang, Cheng and Long, Keyu and Li, Zijian and Rai, Himanshu},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2024},
pages = {7338-7347},
doi = {10.1109/CVPRW63382.2024.00729},
url = {https://mlanthology.org/cvprw/2024/chang2024cvprw-classifier/}
}