Deep Tabular Learning via Distillation and Language Guidance

Abstract

Tabular data is arguably one of the most ubiquitous data structures in application domains such as science, healthcare, finance and manufacturing. Given the recent success of deep learning (DL), there has been a surge of new DL models for tabular learning. However, despite the efforts, tabular DL models still clearly trail behind tree-based approaches. In this work, we propose DisTab, a novel framework for tabular learning based on the transformer architecture. Our method leverages model distillation to mimic the favorable inductive biases of tree-based models, and incorporates language guidance for more expressive feature embeddings. Empirically, DisTab outperforms existing tabular DL models and is highly competitive against tree-based models across diverse datasets, effectively closing the gap with these methods.

Cite

Text

Wang et al. "Deep Tabular Learning via Distillation and Language Guidance." Transactions on Machine Learning Research, 2024.

Markdown

[Wang et al. "Deep Tabular Learning via Distillation and Language Guidance." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/wang2024tmlr-deep/)

BibTeX

@article{wang2024tmlr-deep,
  title     = {{Deep Tabular Learning via Distillation and Language Guidance}},
  author    = {Wang, Ruohan and Fu, Wenhao and Ciliberto, Carlo},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/wang2024tmlr-deep/}
}