All-Atom Protein Generation with Latent Diffusion

Kaydedildi:
Detaylı Bibliyografya
Yayımlandı:bioRxiv (Feb 13, 2025)
Yazar: Lu, Amy X
Diğer Yazarlar: Wilson, Yan, Robinson, Sarah A, Kelow, Simon, Yang, Kevin K, Gligorijevic, Vladimir, Cho, Kyunghyun, Bonneau, Richard, Abbeel, Pieter, Frey, Nathan C
Baskı/Yayın Bilgisi:
Cold Spring Harbor Laboratory Press
Konular:
Online Erişim:Citation/Abstract
Full Text - PDF
Full text outside of ProQuest
Etiketler: Etiketle
Etiket eklenmemiş, İlk siz ekleyin!

MARC

LEADER 00000nab a2200000uu 4500
001 3141237419
003 UK-CbPIL
022 |a 2692-8205 
024 7 |a 10.1101/2024.12.02.626353  |2 doi 
035 |a 3141237419 
045 0 |b d20250213 
100 1 |a Lu, Amy X 
245 1 |a All-Atom Protein Generation with Latent Diffusion 
260 |b Cold Spring Harbor Laboratory Press  |c Feb 13, 2025 
513 |a Working Paper 
520 3 |a While generative models hold immense promise for protein design, existing models are typically backbone-only, despite the indispensable role that sidechain atoms play in mediating function. As prerequisite knowledge, all-atom 3D structure generation require the discrete sequence to specify sidechain identities, which poses a multimodal generation problem. We propose PLAID (Protein Latent Induced Diffusion), which samples from the latent space of a pre-trained sequence-tostructure predictor, ESMFold. The sampled latent embedding is then decoded with frozen decoders into the sequence and all-atom structure. Importantly, PLAID only requires sequence input during training, thus augmenting the dataset size by 2-4 orders of magnitude compared to the Protein Data Bank. It also makes more annotations available for functional control. As a demonstration of annotation-based prompting, we perform compositional conditioning on function and taxonomy using classifier-free guidance. Intriguingly, function-conditioned generations learn active site residue identities, despite them being non-adjacent on the sequence, and can correctly place the sidechains atoms. We further show that PLAID can generate transmembrane proteins with expected hydrophobicity patterns, perform motif scaffolding, and improve unconditional sample quality for long sequences. Links to model weights and training code are publicly available at github.com/amyxlu/plaid.Competing Interest StatementAXL, SAR, SK, VG, KC, RB, and NCF are employees of Genentech Inc., a member of the Roche Group.Footnotes* Title and formatting update: evaluation figures updated, motif scaffolding figure moved to main text, various other changes to content ordering 
653 |a Databases 
653 |a Learning 
653 |a Protein structure 
653 |a Peptide mapping 
653 |a Structure-function relationships 
653 |a Axl protein 
653 |a Amino acid sequence 
653 |a Crystallography 
653 |a Hydrophobicity 
653 |a Proteins 
700 1 |a Wilson, Yan 
700 1 |a Robinson, Sarah A 
700 1 |a Kelow, Simon 
700 1 |a Yang, Kevin K 
700 1 |a Gligorijevic, Vladimir 
700 1 |a Cho, Kyunghyun 
700 1 |a Bonneau, Richard 
700 1 |a Abbeel, Pieter 
700 1 |a Frey, Nathan C 
773 0 |t bioRxiv  |g (Feb 13, 2025) 
786 0 |d ProQuest  |t Biological Science Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3141237419/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3141237419/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u https://www.biorxiv.org/content/10.1101/2024.12.02.626353v2