Fine-Tuned Network Relies on Generic Representation to Solve Unseen Cognitive Task

Abstract

Fine-tuning pretrained language models has shown promising results on a wide range of tasks, but when encountering a novel task, do they rely more on generic pretrained representation, or develop brand new task-specific solutions? Here, we fine-tuned GPT-2 on a context-dependent decision-making task, novel to the model but adapted from neuroscience literature. We compared its performance and internal mechanisms to a version of GPT-2 trained from scratch on the same task. Our results show that fine-tuned models depend heavily on pretrained representations, particularly in later layers, while models trained from scratch develop different, more task-specific mechanisms. These findings highlight the advantages and limitations of pretraining for task generalization and underscore the need for further investigation into the mechanisms underpinning task-specific fine-tuning in LLMs.

Cite

Text

Lin. "Fine-Tuned Network Relies on Generic Representation to Solve Unseen Cognitive Task." ICML 2024 Workshops: LLMs_and_Cognition, 2024.

Markdown

[Lin. "Fine-Tuned Network Relies on Generic Representation to Solve Unseen Cognitive Task." ICML 2024 Workshops: LLMs_and_Cognition, 2024.](https://mlanthology.org/icmlw/2024/lin2024icmlw-finetuned/)

BibTeX

@inproceedings{lin2024icmlw-finetuned,
  title     = {{Fine-Tuned Network Relies on Generic Representation to Solve Unseen Cognitive Task}},
  author    = {Lin, Dongyan},
  booktitle = {ICML 2024 Workshops: LLMs_and_Cognition},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/lin2024icmlw-finetuned/}
}