Fine-Tuned Protein Language Models Capture T Cell Receptor Stochasticity

Abstract

The combinatorial explosion of T cell receptor (TCRs) sequences enables our immune systems to recognise and respond to an enormous diversity of pathogens. Modelling the highly stochastic TCR generation and selection processes at both sequence and repertoire levels is important for disease detection and advancing therapeutic research. Here we demonstrate that protein language models fine-tuned on TCR sequences are able to capture TCR statistics in hypervariable regions to which mechanistic models are blind, and show that amino acids exhibit strong dependencies on each other within chains but not across chains. Our approach generates representations that improve the prediction of TCR binding specificities.

Cite

Text

Cornwall et al. "Fine-Tuned Protein Language Models Capture T Cell Receptor Stochasticity." NeurIPS 2023 Workshops: GenBio, 2023.

Markdown

[Cornwall et al. "Fine-Tuned Protein Language Models Capture T Cell Receptor Stochasticity." NeurIPS 2023 Workshops: GenBio, 2023.](https://mlanthology.org/neuripsw/2023/cornwall2023neuripsw-finetuned/)

BibTeX

@inproceedings{cornwall2023neuripsw-finetuned,
  title     = {{Fine-Tuned Protein Language Models Capture T Cell Receptor Stochasticity}},
  author    = {Cornwall, Lewis and Szep, Grisha and Day, James and Krishnan, S R Gokul and Carter, David and Blundell, Jamie and Wollman, Lilly and Dalchau, Neil and Sim, Aaron},
  booktitle = {NeurIPS 2023 Workshops: GenBio},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/cornwall2023neuripsw-finetuned/}
}