关键词: AI DCA SH3 domain generative models machine learning machine learning-guided directed evolution (MLDE) protein design specificity variational autoencoders (VAE)

Mesh : src Homology Domains Signal Transduction Deep Learning Saccharomyces cerevisiae / genetics metabolism Saccharomyces cerevisiae Proteins / genetics metabolism

来  源:   DOI:10.1016/j.cels.2024.07.005

Abstract:
Evolution-based deep generative models represent an exciting direction in understanding and designing proteins. An open question is whether such models can learn specialized functional constraints that control fitness in specific biological contexts. Here, we examine the ability of generative models to produce synthetic versions of Src-homology 3 (SH3) domains that mediate signaling in the Sho1 osmotic stress response pathway of yeast. We show that a variational autoencoder (VAE) model produces artificial sequences that experimentally recapitulate the function of natural SH3 domains. More generally, the model organizes all fungal SH3 domains such that locality in the model latent space (but not simply locality in sequence space) enriches the design of synthetic orthologs and exposes non-obvious amino acid constraints distributed near and far from the SH3 ligand-binding site. The ability of generative models to design ortholog-like functions in vivo opens new avenues for engineering protein function in specific cellular contexts and environments.
摘要:
基于进化的深度生成模型代表了理解和设计蛋白质的令人兴奋的方向。一个悬而未决的问题是,此类模型是否可以学习专门的功能约束,以控制特定生物学环境中的适应性。这里,我们研究了生成模型产生Src-同源性3(SH3)结构域的合成版本的能力,这些结构域介导酵母Sho1渗透应激反应途径中的信号传导。我们证明了变分自动编码器(VAE)模型产生的人工序列可以通过实验概括自然SH3域的功能。更一般地说,该模型组织了所有真菌SH3结构域,使得模型潜在空间中的局部性(而不仅仅是序列空间中的局部性)丰富了合成直向同源物的设计,并暴露了分布在SH3配体结合位点附近和远处的非明显氨基酸约束。生成模型在体内设计直系同源样功能的能力为在特定细胞背景和环境中工程化蛋白质功能开辟了新途径。
公众号