Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy

Abstract

Contrastive loss is a powerful approach for representation learning, where larger batch sizes enhance performance by providing more negative samples to better distinguish between similar and dissimilar data. However, the full instantiation of the similarity matrix demands substantial GPU memory, making large batch training highly resource-intensive. To address this, we propose a tile-based computation strategy that partitions the contrastive loss calculation into small blocks, avoiding full materialization of the similarity matrix. Additionally, we introduce a multi-level tiling implementation to leverage the hierarchical structure of distributed systems, using ring-based communication at the GPU level to optimize synchronization and fused kernels at the CUDA core level to reduce I/O overhead. Experimental results show that the proposed method significantly reduces GPU memory usage in contrastive loss. For instance, it enables contrastive training of a CLIP-ViT-L/14 model with a batch size of 4M using only 8 A800 80GB GPUs, without sacrificing accuracy. Compared to state-of-the-art memory-efficient solutions, it achieves a two-order-of-magnitude reduction in memory while maintaining comparable speed. The code will be made publicly available.

Cite

Text

Cheng et al. "Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00938

Markdown

[Cheng et al. "Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/cheng2025cvpr-breaking/) doi:10.1109/CVPR52734.2025.00938

BibTeX

@inproceedings{cheng2025cvpr-breaking,
  title     = {{Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy}},
  author    = {Cheng, Zesen and Zhang, Hang and Li, Kehan and Leng, Sicong and Hu, Zhiqiang and Wu, Fei and Zhao, Deli and Li, Xin and Bing, Lidong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {10036-10045},
  doi       = {10.1109/CVPR52734.2025.00938},
  url       = {https://mlanthology.org/cvpr/2025/cheng2025cvpr-breaking/}
}