Latent-Based Diffusion Model for Long-Tailed Recognition

Pengxiao Han, Changkun Ye, Jieming Zhou, Jing Zhang, Jie Hong, Xuesong Li

CVPRW 2024 pp. 2639-2648

doi:10.1109/CVPRW63382.2024.00270 /cvprw/2024/han2024cvprw-latentbased/

Abstract

Long-tailed imbalance distribution is a common issue in practical computer vision applications. Previous works proposed methods to address this problem, which can be categorized into several classes: re-sampling, re-weighting, transfer learning, and feature augmentation. In recent years, diffusion models have shown an impressive generation ability in many sub-problems of deep computer vision. However, its powerful generation has not been explored in long-tailed problems. We propose a new approach, the Latent-based Diffusion Model for Long-tailed Recognition (LDMLR), as a feature augmentation method to tackle the issue. First, we encode the imbalanced dataset into features using the baseline model. Then, we train a Denoising Diffusion Implicit Model (DDIM) using these encoded features to generate pseudo-features. Finally, we train the classifier using the encoded and pseudo-features from the previous two steps. The model’s accuracy shows an improvement on the CIFAR-LT and ImageNet-LT datasets by using the proposed method.

PDF CVPRW Semantic Scholar

Cite

Text

Han et al. "Latent-Based Diffusion Model for Long-Tailed Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00270

Markdown

[Han et al. "Latent-Based Diffusion Model for Long-Tailed Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/han2024cvprw-latentbased/) doi:10.1109/CVPRW63382.2024.00270

BibTeX

@inproceedings{han2024cvprw-latentbased,
  title     = {{Latent-Based Diffusion Model for Long-Tailed Recognition}},
  author    = {Han, Pengxiao and Ye, Changkun and Zhou, Jieming and Zhang, Jing and Hong, Jie and Li, Xuesong},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {2639-2648},
  doi       = {10.1109/CVPRW63382.2024.00270},
  url       = {https://mlanthology.org/cvprw/2024/han2024cvprw-latentbased/}
}