关键词: Ancestral Sequence Reconstruction Consensus Design Protein Folding Protein Stability Singular Value Decomposition

来  源:   DOI:10.1101/2023.06.29.547063   PDF(Pubmed)

Abstract:
A protein sequence encodes its energy landscape - all the accessible conformations, energetics, and dynamics. The evolutionary relationship between sequence and landscape can be probed phylogenetically by compiling a multiple sequence alignment of homologous sequences and generating common ancestors via Ancestral Sequence Reconstruction or a consensus protein containing the most common amino acid at each position. Both ancestral and consensus proteins are often more stable than their extant homologs - questioning the differences and suggesting that both approaches serve as general methods to engineer thermostability. We used the Ribonuclease H family to compare these approaches and evaluate how the evolutionary relationship of the input sequences affects the properties of the resulting consensus protein. While the overall consensus protein is structured and active, it neither shows properties of a well-folded protein nor has enhanced stability. In contrast, the consensus protein derived from a phylogenetically-restricted region is significantly more stable and cooperatively folded, suggesting that cooperativity may be encoded by different mechanisms in separate clades and lost when too many diverse clades are combined to generate a consensus protein. To explore this, we compared pairwise covariance scores using a Potts formalism as well as higher-order couplings using singular value decomposition (SVD). We find the SVD coordinates of a stable consensus sequence are close to coordinates of the analogous ancestor sequence and its descendants, whereas the unstable consensus sequences are outliers in SVD space.
摘要:
蛋白质序列编码其能量景观-所有可获得的构象,能量学,和动态。序列和景观之间的进化关系可以通过编译同源序列的多序列比对并通过祖先序列重建或在每个位置包含最常见氨基酸的共有蛋白产生共同祖先来进行系统发育。祖先和共有蛋白质通常比现有的同源物更稳定-质疑差异,并暗示这两种方法都可以作为设计热稳定性的一般方法。我们使用核糖核酸酶H家族来比较这些方法,并评估输入序列的进化关系如何影响所得共有蛋白的性质。虽然整体共有蛋白是结构化和活性的,它既不显示折叠良好的蛋白质的特性,也没有增强的稳定性。相比之下,来自系统发育限制性区域的共有蛋白明显更稳定和合作折叠,这表明协同性可能在不同的进化枝中由不同的机制编码,而当太多不同的进化枝结合在一起产生一个共有蛋白时就会丢失。为了探索这个,我们使用Potts形式主义比较了成对协方差分数,以及使用奇异值分解(SVD)的高阶耦合。我们发现稳定的共有序列的SVD坐标接近类似祖先序列及其后代的坐标,而不稳定的共有序列是SVD空间中的异常值。
公众号