Multi-Segment Preserving Sampling for Deep Manifold Sampler
Abstract
Deep generative modeling for biological sequences presents a unique challenge in reconciling the bias-variance trade-off between explicit biological insight and model flexibility. The deep manifold sampler was recently proposed as a means to iteratively sample variable-length protein sequences. Sampling was done by exploiting the gradients from a function predictor trained on top of the manifold sampler. In this work, we introduce an alternative approach to guided sampling that enables the direct inclusion of domain-specific knowledge by designating preserved and non-preserved segments along the input sequence, thereby restricting variation to only select regions. We call this method ``multi-segment preserving sampling" and present its effectiveness in the context of antibody design. We train two models: a deep manifold sampler and a GPT-2 language model on nearly six million heavy chain sequences annotated with the \textit{IGHV1-18} gene. During sampling, we restrict variation to only the complementarity-determining region 3 (CDR3) of the input. We obtain log probability scores from a GPT-2 model for each sampled CDR3 and demonstrate that multi-segment preserving sampling generates reasonable designs while maintaining the desired, preserved regions.
Cite
Text
Berenberg et al. "Multi-Segment Preserving Sampling for Deep Manifold Sampler." ICLR 2022 Workshops: MLDD, 2022.Markdown
[Berenberg et al. "Multi-Segment Preserving Sampling for Deep Manifold Sampler." ICLR 2022 Workshops: MLDD, 2022.](https://mlanthology.org/iclrw/2022/berenberg2022iclrw-multisegment/)BibTeX
@inproceedings{berenberg2022iclrw-multisegment,
title = {{Multi-Segment Preserving Sampling for Deep Manifold Sampler}},
author = {Berenberg, Dan and Lee, Jae Hyeon and Kelow, Simon and Park, Ji Won and Watkins, Andrew and Bonneau, Richard and Gligorijevic, Vladimir and Ra, Stephen and Cho, Kyunghyun},
booktitle = {ICLR 2022 Workshops: MLDD},
year = {2022},
url = {https://mlanthology.org/iclrw/2022/berenberg2022iclrw-multisegment/}
}