Conditional Generation of Antigen Specific T-Cell Receptor Sequences
Abstract
Training and evaluating large language models (LLMs) for use in the design of antigen specific T-cell receptor (TCR) sequences is challenging due to the complex many-to-many mapping between TCRs and their targets, a struggle exacerbated by a severe lack of ground truth data. Traditional NLP metrics can be artificially poor indicators of model performance since labels are concentrated on a few examples, and functional in-vitro assessment of generated TCRs is time-consuming and costly. Here, we introduce TCR-BART and TCR-T5, adapted from the prominent BART and T5 models, to explore the use of these LLMs for conditional TCR sequence generation given a specific target epitope. To fairly evaluate such models with limited labeled examples, we propose novel evaluation metrics tailored to the sparsely sampled many-to-many nature of TCR-epitope data and investigate the interplay between accuracy and diversity of generated TCR sequences.
Cite
Text
Karthikeyan et al. "Conditional Generation of Antigen Specific T-Cell Receptor Sequences." NeurIPS 2023 Workshops: GenBio, 2023.Markdown
[Karthikeyan et al. "Conditional Generation of Antigen Specific T-Cell Receptor Sequences." NeurIPS 2023 Workshops: GenBio, 2023.](https://mlanthology.org/neuripsw/2023/karthikeyan2023neuripsw-conditional/)BibTeX
@inproceedings{karthikeyan2023neuripsw-conditional,
title = {{Conditional Generation of Antigen Specific T-Cell Receptor Sequences}},
author = {Karthikeyan, Dhuvarakesh and Raffel, Colin and Vincent, Benjamin and Rubinsteyn, Alex},
booktitle = {NeurIPS 2023 Workshops: GenBio},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/karthikeyan2023neuripsw-conditional/}
}