Instruction-Tuned LLMs with World Knowledge Are More Aligned to the Human Brain
Abstract
Instruction-tuning is a widely adopted method of finetuning that enables large language models (LLMs) to generate output that more closely resembles human responses to natural language queries, in many cases leading to human-level performance on diverse testbeds. However, it remains unclear whether instruction-tuning truly makes LLMs more similar to how humans process language. We investigate the effect of instruction-tuning on brain alignment, the similarity of LLM internal representations to neural activity in the human language system. We assess 25 vanilla and instruction-tuned LLMs across three datasets involving humans reading naturalistic stories and sentences, and discover that instruction-tuning generally enhances brain alignment by an average of 6%. To identify the factors underlying LLM-brain alignment, we compute the correlation between the brain alignment of LLMs and various model properties, such as model size, performance ability on problem-solving benchmarks, and ability on benchmarks requiring world knowledge spanning various domains. Notably, we find a strong positive correlation between brain alignment and model size (r = 0.95), as well as performance on tasks requiring world knowledge (r = 0.81). Our results demonstrate that instruction-tuning LLMs improves both world knowledge representations and human brain alignment, suggesting that mechanisms that encode world knowledge in LLMs also improve representational alignment to the human brain.
Cite
Text
Aw et al. "Instruction-Tuned LLMs with World Knowledge Are More Aligned to the Human Brain." NeurIPS 2023 Workshops: UniReps, 2023.Markdown
[Aw et al. "Instruction-Tuned LLMs with World Knowledge Are More Aligned to the Human Brain." NeurIPS 2023 Workshops: UniReps, 2023.](https://mlanthology.org/neuripsw/2023/aw2023neuripsw-instructiontuned-a/)BibTeX
@inproceedings{aw2023neuripsw-instructiontuned-a,
title = {{Instruction-Tuned LLMs with World Knowledge Are More Aligned to the Human Brain}},
author = {Aw, Khai Loong and Montariol, Syrielle and AlKhamissi, Badr and Schrimpf, Martin and Bosselut, Antoine},
booktitle = {NeurIPS 2023 Workshops: UniReps},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/aw2023neuripsw-instructiontuned-a/}
}