Ancestral Protein Sequence Reconstruction Using a Tree-Structured Ornstein-Uhlenbeck Variational Autoencoder
Abstract
We introduce a deep generative model for representation learning of biological sequences that, unlike existing models, explicitly represents the evolutionary process. The model makes use of a tree-structured Ornstein-Uhlenbeck process, obtained from a given phylogenetic tree, as an informative prior for a variational autoencoder. We show the model performs well on the task of ancestral sequence reconstruction of single protein families. Our results and ablation studies indicate that the explicit representation of evolution using a suitable tree-structured prior has the potential to improve representation learning of biological sequences considerably. Finally, we briefly discuss extensions of the model to genomic-scale data sets and the case of a latent phylogenetic tree.
Cite
Text
Moreta et al. "Ancestral Protein Sequence Reconstruction Using a Tree-Structured Ornstein-Uhlenbeck Variational Autoencoder." International Conference on Learning Representations, 2022.Markdown
[Moreta et al. "Ancestral Protein Sequence Reconstruction Using a Tree-Structured Ornstein-Uhlenbeck Variational Autoencoder." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/moreta2022iclr-ancestral/)BibTeX
@inproceedings{moreta2022iclr-ancestral,
title = {{Ancestral Protein Sequence Reconstruction Using a Tree-Structured Ornstein-Uhlenbeck Variational Autoencoder}},
author = {Moreta, Lys Sanz and Rønning, Ola and Al-Sibahi, Ahmad Salim and Hein, Jotun and Theobald, Douglas and Hamelryck, Thomas},
booktitle = {International Conference on Learning Representations},
year = {2022},
url = {https://mlanthology.org/iclr/2022/moreta2022iclr-ancestral/}
}