背景:碳酸酐酶(CA)酶促进CO2可逆水合为碳酸氢根离子和质子。识别高效和稳健的CA并在模型宿主细胞中表达它们,如大肠杆菌,使这些酶的工程更有效的工业二氧化碳捕获。然而,由于不溶性蛋白聚集体的可能形成,大肠杆菌中CA的表达具有挑战性,或包涵体。这使得可溶性和活性CA蛋白的生产成为下游应用的先决条件。
结果:在这项研究中,我们通过选择7个顶级CA候选物简化了CA的表达过程,并使用两个生物信息学工具来预测它们在大肠杆菌中表达的溶解度。预测结果将这些酶分为两类:低溶解度和高溶解度。我们表达了高溶解度评分CAs(即CA5-SspCA,CA6-SazCAtrunc,CA7-PabCA和CA8-PhoCA)导致烧瓶培养物中蛋白质产量显着提高(每升5至75mg纯化蛋白质),表明溶解度预测评分和蛋白质表达产量之间的强相关性。此外,系统发育树分析显示了蛋白质溶解度和产量的CA类特异性聚类模式。出乎意料的是,我们还发现,独特的N端,在信号序列之后发现的11个氨基酸片段(在其同源物中不存在),对CA6-SazCA活性至关重要。
结论:总体而言,这项工作证明了蛋白质溶解度预测,系统发育树分析,和实验验证是识别顶级CA候选物,然后产生可溶性,这些酶在大肠杆菌中的活性形式。我们在此报告的综合方法应该可扩展到其他异源蛋白质在大肠杆菌中的表达。
BACKGROUND: Carbonic anhydrase (CA) enzymes facilitate the reversible hydration of CO2 to bicarbonate ions and protons. Identifying efficient and robust CAs and expressing them in model host cells, such as Escherichia coli, enables more efficient engineering of these enzymes for industrial CO2 capture. However, expression of CAs in E. coli is challenging due to the possible formation of insoluble protein aggregates, or inclusion bodies. This makes the production of soluble and active CA protein a prerequisite for downstream applications.
RESULTS: In this study, we streamlined the process of CA expression by selecting seven top CA candidates and used two bioinformatic tools to predict their solubility for expression in E. coli. The prediction results place these enzymes in two categories: low and high solubility. Our expression of high solubility score CAs (namely CA5-SspCA, CA6-SazCAtrunc, CA7-PabCA and CA8-PhoCA) led to significantly higher protein yields (5 to 75 mg purified protein per liter) in flask cultures, indicating a strong correlation between the solubility prediction score and protein expression yields. Furthermore, phylogenetic tree analysis demonstrated CA class-specific clustering patterns for protein solubility and production yields. Unexpectedly, we also found that the unique N-terminal, 11-amino acid segment found after the signal sequence (not present in its homologs), was essential for CA6-SazCA activity.
CONCLUSIONS: Overall, this work demonstrated that protein solubility prediction, phylogenetic tree analysis, and experimental validation are potent tools for identifying top CA candidates and then producing soluble, active forms of these enzymes in E. coli. The comprehensive approaches we report here should be extendable to the expression of other heterogeneous proteins in E. coli.