How to Hallucinate Functional Proteins

Authors: Zak Costello, Hector Garcia Martin

arXiv: 1903.00458v1 - DOI (q-bio.QM)

Abstract: Here we present a novel approach to protein design and phenotypic inference using a generative model for protein sequences. BioSeqVAE, a variational autoencoder variant, can hallucinate syntactically valid protein sequences that are likely to fold and function. BioSeqVAE is trained on the entire known protein sequence space and learns to generate valid examples of protein sequences in an unsupervised manner. The model is validated by showing that its latent feature space is useful and that it accurately reconstructs sequences. Its usefulness is demonstrated with a selection of relevant downstream design tasks. This work is intended to serve as a computational first step towards a general purpose structure free protein design tool.

Submitted to arXiv on 01 Mar. 2019

Explore the paper tree

Click on the tree nodes to be redirected to a given paper and access their summaries and virtual assistant

Also access our AI generated Summaries, or ask questions about this paper to our AI assistant.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.