Unsupervised Pretraining for Fact Verification by Language Model Distillation

Abstract

Fact verification aims to verify a claim using evidence from a trustworthy knowledge base. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL ($\underline{S}$elf-supervised $\underline{Fa}$ct $\underline{Ve}$rification via $\underline{L}$anguage Model Distillation), a novel unsupervised pretraining framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on FB15k-237 (+5.3\% Hits@1) and FEVER (+8\% accuracy) with linear evaluation.

Cite

Text

Bazaga et al. "Unsupervised Pretraining for Fact Verification by Language Model Distillation." International Conference on Learning Representations, 2024.

Markdown

[Bazaga et al. "Unsupervised Pretraining for Fact Verification by Language Model Distillation." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/bazaga2024iclr-unsupervised/)

BibTeX

@inproceedings{bazaga2024iclr-unsupervised,
  title     = {{Unsupervised Pretraining for Fact Verification by Language Model Distillation}},
  author    = {Bazaga, Adrián and Lio, Pietro and Micklem, Gos},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/bazaga2024iclr-unsupervised/}
}