de novo proteins

  • 文章类型: Journal Article
    During de novo emergence, new protein coding genes emerge from previously nongenic sequences. The de novo proteins they encode are dissimilar in composition and predicted biochemical properties to conserved proteins. However, functional de novo proteins indeed exist. Both identification of functional de novo proteins and their structural characterization are experimentally laborious. To identify functional and structured de novo proteins in silico, we applied recently developed machine learning based tools and found that most de novo proteins are indeed different from conserved proteins both in their structure and sequence. However, some de novo proteins are predicted to adopt known protein folds, participate in cellular reactions, and to form biomolecular condensates. Apart from broadening our understanding of de novo protein evolution, our study also provides a large set of testable hypotheses for focused experimental studies on structure and function of de novo proteins in Drosophila.






  • 文章类型: Journal Article
    Protein synthesis methods have been adapted to incorporate an ever-growing level of non-natural components. Meanwhile, design of de novo protein structure and function has rapidly emerged as a viable capability. Yet, these two exciting trends have yet to intersect in a meaningful way. The ability to perform de novo design with non-proteinogenic components requires that synthesis and computation align on common targets and applications. This perspective examines the state of the art in these areas and identifies specific, consequential applications to advance the field toward generalized macromolecule design.






  • 文章类型: Journal Article
    Computational protein sequence design has the ambitious goal of modifying existing or creating new proteins; however, designing stable and functional proteins is challenging without predictability of protein dynamics and allostery. Informing protein design methods with evolutionary information limits the mutational space to more native-like sequences and results in increased stability while maintaining functions. Recently, language models, trained on millions of protein sequences, have shown impressive performance in predicting the effects of mutations. Assessing Rosetta-designed sequences with a language model showed scores that were worse than those of their original sequence. To inform Rosetta design protocols with language model predictions, we added a new metric to restrain the energy function during design using the Evolutionary Scale Modeling (ESM) model. The resulting sequences have better language model scores and similar sequence recovery, with only a minor decrease in the fitness as assessed by Rosetta energy. In conclusion, our work combines the strength of recent machine learning approaches with the Rosetta protein design toolbox.






  • 文章类型: Journal Article
    Multispecific antibodies recognize two or more epitopes located on the same or distinct targets. This added capability through protein design allows these man-made molecules to address unmet medical needs that are no longer possible with single targeting such as with monoclonal antibodies or cytokines alone. However, the approach to the development of these multispecific molecules has been met with numerous road bumps, which suggests that a new workflow for multispecific molecules is required. The investigation of the molecular basis that mediates the successful assembly of the building blocks into non-native quaternary structures will lead to the writing of a playbook for multispecifics. This is a must do if we are to design workflows that we can control and in turn predict success. Here, we reflect on the current state-of-the-art of therapeutic biologics and look at the building blocks, in terms of proteins, and tools that can be used to build the foundations of such a next-generation workflow.






  • 文章类型: Journal Article
    Understanding the emergence and structural characteristics of de novo and random proteins is crucial for unraveling protein evolution and designing novel enzymes. However, experimental determination of their structures remains challenging. Recent advancements in protein structure prediction, particularly with AlphaFold2 (AF2), have expanded our knowledge of protein structures, but their applicability to de novo and random proteins is unclear. In this study, we investigate the structural predictions and confidence scores of AF2 and protein language model-based predictor ESMFold for de novo and conserved proteins from Drosophila and a dataset of comparable random proteins. We find that the structural predictions for de novo and random proteins differ significantly from conserved proteins. Interestingly, a positive correlation between disorder and confidence scores (pLDDT) is observed for de novo and random proteins, in contrast to the negative correlation observed for conserved proteins. Furthermore, the performance of structure predictors for de novo and random proteins is hampered by the lack of sequence identity. We also observe fluctuating median predicted disorder among different sequence length quartiles for random proteins, suggesting an influence of sequence length on disorder predictions. In conclusion, while structure predictors provide initial insights into the structural composition of de novo and random proteins, their accuracy and applicability to such proteins remain limited. Experimental determination of their structures is necessary for a comprehensive understanding. The positive correlation between disorder and pLDDT could imply a potential for conditional folding and transient binding interactions of de novo and random proteins.






  • DOI:
    文章类型: Preprint
    Through evolution, nature has presented a set of remarkable protein materials, including elastins, silks, keratins and collagens with superior mechanical performances that play crucial roles in mechanobiology. However, going beyond natural designs to discover proteins that meet specified mechanical properties remains challenging. Here we report a generative model that predicts protein designs to meet complex nonlinear mechanical property-design objectives. Our model leverages deep knowledge on protein sequences from a pre-trained protein language model and maps mechanical unfolding responses to create novel proteins. Via full-atom molecular simulations for direct validation, we demonstrate that the designed proteins are novel, and fulfill the targeted mechanical properties, including unfolding energy and mechanical strength, as well as the detailed unfolding force-separation curves. Our model offers rapid pathways to explore the enormous mechanobiological protein sequence space unconstrained by biological synthesis, using mechanical features as target to enable the discovery of protein materials with superior mechanical properties.






  • 文章类型: Journal Article
    Coiled coils are a widespread and well understood protein fold. Their short and simple repeats underpin considerable structural and functional diversity. The vast majority of coiled coils consist of 7-residue (heptad) sequence repeats, but in essence most combinations of 3- and 4-residue segments, each starting with a residue of the hydrophobic core, are compatible with coiled-coil structure. The most frequent among these other repeat patterns are 11-residue (hendecad, 3 + 4 + 4) repeats. Hendecads are frequently found in low copy number, interspersed between heptads, but some proteins consist largely or entirely of hendecad repeats. Here we describe the first large-scale survey of these proteins in the proteome of life. For this, we scanned the protein sequence database for sequences with 11-residue periodicity that lacked β-strand prediction. We then clustered these by pairwise similarity to construct a map of potential hendecad coiled-coil families. Here we discuss these according to their structural properties, their potential cellular roles, and the evolutionary mechanisms shaping their diversity. We note in particular the continuous amplification of hendecads, both within existing proteins and de novo from previously non-coding sequence, as a powerful mechanism in the genesis of new coiled-coil forms.






  • 文章类型: Journal Article
    We present a Nip site model of acetyl coenzyme-A synthase (ACS) within a de novo-designed trimer peptide that self-assembles to produce a homoleptic Ni(Cys)3 binding motif. Spectroscopic and kinetic studies of ligand binding demonstrate that Ni binding stabilizes the peptide assembly and produces a terminal NiI-CO complex. When the CO-bound state is reacted with a methyl donor, a new species is quickly produced with new spectral features. While the metal-bound CO is albeit unactivated, the presence of the methyl donor produces an activated metal-CO complex. Selective outer sphere steric modifications demonstrate that the physical properties of the ligand-bound states are altered differently depending on the location of the steric modification above or below the Ni site.






  • 文章类型: Journal Article
    Background: De novo protein coding genes emerge from scratch in the non-coding regions of the genome and have, per definition, no homology to other genes. Therefore, their encoded de novo proteins belong to the so-called \"dark protein space\". So far, only four de novo protein structures have been experimentally approximated. Low homology, presumed high disorder and limited structures result in low confidence structural predictions for de novo proteins in most cases. Here, we look at the most widely used structure and disorder predictors and assess their applicability for de novo emerged proteins. Since AlphaFold2 is based on the generation of multiple sequence alignments and was trained on solved structures of largely conserved and globular proteins, its performance on de novo proteins remains unknown. More recently, natural language models of proteins have been used for alignment-free structure predictions, potentially making them more suitable for de novo proteins than AlphaFold2. Methods: We applied different disorder predictors (IUPred3 short/long, flDPnn) and structure predictors, AlphaFold2 on the one hand and language-based models (Omegafold, ESMfold, RGN2) on the other hand, to four de novo proteins with experimental evidence on structure. We compared the resulting predictions between the different predictors as well as to the existing experimental evidence. Results: Results from IUPred, the most widely used disorder predictor, depend heavily on the choice of parameters and differ significantly from flDPnn which has been found to outperform most other predictors in a comparative assessment study recently. Similarly, different structure predictors yielded varying results and confidence scores for de novo proteins. Conclusions: We suggest that, while in some cases protein language model based approaches might be more accurate than AlphaFold2, the structure prediction of de novo emerged proteins remains a difficult task for any predictor, be it disorder or structure.






  • 文章类型: Journal Article
    De novo metalloprotein design involves the construction of proteins guided by specific repeat patterns of polar and apolar residues, which, upon self-assembly, provide a suitable environment to bind metals and produce artificial metalloenzymes. While a wide range of functionalities have been realized in de novo designed metalloproteins, the functional repertoire of such constructs towards alternative energy-relevant catalysis is currently limited. Here we show the application of de novo approach to design a functional H2 evolving protein. The design involved the assembly of an amphiphilic peptide featuring cysteines at tandem a/d sites of each helix. Intriguingly, upon NiII addition, the oligomers shift from a major trimeric assembly to a mix of dimers and trimers. The metalloprotein produced H2 photocatalytically with a bell-shape pH dependence, having a maximum activity at pH 5.5. Transient absorption spectroscopy is used to determine the timescales of electron transfer as a function of pH. Selective outer sphere mutations are made to probe how the local environment tunes activity. A preferential enhancement of activity is observed via steric modulation above the NiII site, towards the N-termini, compared to below the NiII site towards the C-termini.





