Scarf: Self-Supervised Contrastive Learning Using Random Feature Corruption

Abstract

Self-supervised contrastive representation learning has proved incredibly successful in the vision and natural language domains, enabling state-of-the-art performance with orders of magnitude less labeled data. However, such methods are domain-specific and little has been done to leverage this technique on real-world \emph{tabular} datasets. We propose \textsc{Scarf}, a simple, widely-applicable technique for contrastive learning, where views are formed by corrupting a random subset of features. When applied to pre-train deep neural networks on the 69 real-world, tabular classification datasets from the OpenML-CC18 benchmark, \textsc{Scarf} not only improves classification accuracy in the fully-supervised setting but does so also in the presence of label noise and in the semi-supervised setting where only a fraction of the available training data is labeled. We show that \textsc{Scarf} complements existing strategies and outperforms alternatives like autoencoders. We conduct comprehensive ablations, detailing the importance of a range of factors.

Cite

Text

Bahri et al. "Scarf: Self-Supervised Contrastive Learning Using Random Feature Corruption." International Conference on Learning Representations, 2022.

Markdown

[Bahri et al. "Scarf: Self-Supervised Contrastive Learning Using Random Feature Corruption." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/bahri2022iclr-scarf/)

BibTeX

@inproceedings{bahri2022iclr-scarf,
  title     = {{Scarf: Self-Supervised Contrastive Learning Using Random Feature Corruption}},
  author    = {Bahri, Dara and Jiang, Heinrich and Tay, Yi and Metzler, Donald},
  booktitle = {International Conference on Learning Representations},
  year      = {2022},
  url       = {https://mlanthology.org/iclr/2022/bahri2022iclr-scarf/}
}