sequence-performance mapping

  • 文章类型: Journal Article
    蛋白质显影性是用于治疗的必要条件,诊断,或工业应用。许多可显影性测定是低通量的,这将它们的效用限制在蛋白质发现和进化的后期阶段。最近的方法可以对更多的变体进行实验或计算评估,然而,跨蛋白质家族和可开发性指标的适用性的广度是不确定的。这里,三种文库规模的检测-酵母蛋白酶,分裂绿色荧光蛋白(GFP),和非特异性结合-评估了它们预测小蛋白支架亲和体和纤连蛋白的两个关键发育结果(热稳定性和重组表达)的能力。通过在文库规模的测定数据上训练的线性相关和机器学习模型来评估测定的预测能力。酵母上的蛋白酶测定高度预测两种支架的热稳定性,分裂GFP测定法提供了亲和体热稳定性和表达的信息。文库规模的数据用于绘制亲和体和纤连蛋白结合互补位的序列发育性景观,指导未来的变体和库的设计。
    Protein developability is requisite for use in therapeutic, diagnostic, or industrial applications. Many developability assays are low throughput, which limits their utility to the later stages of protein discovery and evolution. Recent approaches enable experimental or computational assessment of many more variants, yet the breadth of applicability across protein families and developability metrics is uncertain. Here, three library-scale assays-on-yeast protease, split green fluorescent protein (GFP), and non-specific binding-were evaluated for their ability to predict two key developability outcomes (thermal stability and recombinant expression) for the small protein scaffolds affibody and fibronectin. The assays\' predictive capabilities were assessed via both linear correlation and machine learning models trained on the library-scale assay data. The on-yeast protease assay is highly predictive of thermal stability for both scaffolds, and the split-GFP assay is informative of affibody thermal stability and expression. The library-scale data was used to map sequence-developability landscapes for affibody and fibronectin binding paratopes, which guides future design of variants and libraries.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    新的和改进的蛋白质的发现和进化赋予了分子疗法,诊断,工业生物技术。发现和进化都需要高效的屏幕和有效的库,尽管由于缺席或存在,他们的挑战有所不同,分别,具有所需功能的初始蛋白质变体。大量的高通量技术-实验性和计算性-使有效的筛选能够识别高性能蛋白质变体。在伙伴关系中,需要对序列空间进行明智的搜索来克服巨大的,稀疏,和序列性能景观的复杂性。在蛋白质工程的历史轨迹早期,这些元素与不同的方法对齐,以识别性能最高的序列:从大的选择,随机组合库与合理计算设计。这些观点的协同作用现已取得实质性进展。组合库的合理设计有助于序列空间的实验搜索,和高通量,高完整性的实验数据为计算设计提供信息。在协作接口的核心,有效的蛋白质表征(而不仅仅是选择最佳变体)绘制了序列性能景观。这样的定量图阐明了蛋白质序列和性能之间的复杂关系-例如,绑定,催化效率,生物活性,和可开发性-从而推进基础蛋白质科学并促进蛋白质发现和进化。
    Discovery and evolution of new and improved proteins has empowered molecular therapeutics, diagnostics, and industrial biotechnology. Discovery and evolution both require efficient screens and effective libraries, although they differ in their challenges because of the absence or presence, respectively, of an initial protein variant with the desired function. A host of high-throughput technologies-experimental and computational-enable efficient screens to identify performant protein variants. In partnership, an informed search of sequence space is needed to overcome the immensity, sparsity, and complexity of the sequence-performance landscape. Early in the historical trajectory of protein engineering, these elements aligned with distinct approaches to identify the most performant sequence: selection from large, randomized combinatorial libraries versus rational computational design. Substantial advances have now emerged from the synergy of these perspectives. Rational design of combinatorial libraries aids the experimental search of sequence space, and high-throughput, high-integrity experimental data inform computational design. At the core of the collaborative interface, efficient protein characterization (rather than mere selection of optimal variants) maps sequence-performance landscapes. Such quantitative maps elucidate the complex relationships between protein sequence and performance-e.g., binding, catalytic efficiency, biological activity, and developability-thereby advancing fundamental protein science and facilitating protein discovery and evolution.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号