Training Plug N' Play Knowledge Modules with Deep Context Distillation
Abstract
Dynamically integrating new or rapidly evolving information after Language Model (LM) pre-training remains challenging, particularly in low-data scenarios or when dealing with private and specialized documents. In-context learning and retrieval-augmented generation (RAG) face limitations, including their high inference costs and their inability to capture global document information. In this paper, we propose a way of modularizing knowledge by training Knowledge Modules (KMs). KMs are lightweight components implemented as parameter-efficient LoRA modules, which are trained to store information about new documents and can be easily plugged into models on demand. We show that next-token prediction performs poorly in training KMs. We instead propose Deep Context Distillation: we learn KMs parameters such as to simulate hidden states and logits of a teacher that takes the document in context. Our method outperforms standard next-token prediction and pre-instruction training techniques, across two datasets. Finally, we highlight synergies between KMs and retrieval-augmented generation.
Cite
Text
Caccia et al. "Training Plug N' Play Knowledge Modules with Deep Context Distillation." ICLR 2025 Workshops: MCDC, 2025.Markdown
[Caccia et al. "Training Plug N' Play Knowledge Modules with Deep Context Distillation." ICLR 2025 Workshops: MCDC, 2025.](https://mlanthology.org/iclrw/2025/caccia2025iclrw-training/)BibTeX
@inproceedings{caccia2025iclrw-training,
title = {{Training Plug N' Play Knowledge Modules with Deep Context Distillation}},
author = {Caccia, Lucas and Ansell, Alan and Vulić, Ivan and Ponti, Edoardo and Sordoni, Alessandro},
booktitle = {ICLR 2025 Workshops: MCDC},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/caccia2025iclrw-training/}
}