Unsupervised Pretraining for Fact Verification by Language Model Distillation
Abstract
Fact verification aims to verify a claim using evidence from a trustworthy knowledge base. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL ($\underline{S}$elf-supervised $\underline{Fa}$ct $\underline{Ve}$rification via $\underline{L}$anguage Model Distillation), a novel unsupervised pretraining framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on FB15k-237 (+5.3\% Hits@1) and FEVER (+8\% accuracy) with linear evaluation.
Cite
Text
Bazaga et al. "Unsupervised Pretraining for Fact Verification by Language Model Distillation." International Conference on Learning Representations, 2024.Markdown
[Bazaga et al. "Unsupervised Pretraining for Fact Verification by Language Model Distillation." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/bazaga2024iclr-unsupervised/)BibTeX
@inproceedings{bazaga2024iclr-unsupervised,
title = {{Unsupervised Pretraining for Fact Verification by Language Model Distillation}},
author = {Bazaga, Adrián and Lio, Pietro and Micklem, Gos},
booktitle = {International Conference on Learning Representations},
year = {2024},
url = {https://mlanthology.org/iclr/2024/bazaga2024iclr-unsupervised/}
}