Retrieval & Fine-Tuning for In-Context Tabular Models
Abstract
Tabular data is a pervasive modality spanning a wide range of domains, and this inherent diversity poses a considerable challenge for deep learning. Recent advancements using transformer-based in-context learning have shown promise on smaller and less complex tabular datasets, but have struggled to scale to larger and more complex ones. To address this limitation, we propose a combination of retrieval and fine-tuning: we can adapt the transformer to a local subset of the data by collecting nearest neighbours, and then perform task-specific fine-tuning with this retrieved set of neighbours in context. Using TabPFN as the base model -- currently the best tabular in-context learner -- and applying our retrieval and fine-tuning scheme on top results in what we call a locally-calibrated PFN, or LoCalPFN. We conduct extensive evaluation on 95 datasets curated by TabZilla from OpenML, upon which we establish a new state-of-the-art with LoCalPFN -- even with respect to tuned tree-based models. Notably, we show a significant boost in performance compared to the base in-context model, demonstrating the efficacy of our approach and advancing the frontier of deep learning in tabular data.
Cite
Text
Thomas et al. "Retrieval & Fine-Tuning for In-Context Tabular Models." Neural Information Processing Systems, 2024. doi:10.52202/079017-3442Markdown
[Thomas et al. "Retrieval & Fine-Tuning for In-Context Tabular Models." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/thomas2024neurips-retrieval/) doi:10.52202/079017-3442BibTeX
@inproceedings{thomas2024neurips-retrieval,
title = {{Retrieval & Fine-Tuning for In-Context Tabular Models}},
author = {Thomas, Valentin and Ma, Junwei and Hosseinzadeh, Rasa and Golestan, Keyvan and Yu, Guangwei and Volkovs, Maksims and Caterini, Anthony},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-3442},
url = {https://mlanthology.org/neurips/2024/thomas2024neurips-retrieval/}
}