关键词: bacterial polysaccharide phage receptor-binding protein phage therapy phage–host specificity serotype

Mesh : Deep Learning Bacteriophages / genetics Host Specificity / genetics Genomics / methods Genome, Bacterial Viral Tail Proteins / genetics Genome, Viral Bacteria / virology genetics Glycoside Hydrolases / genetics

来  源:   DOI:10.1093/gigascience/giae017   PDF(Pubmed)

Abstract:
Phage therapy, reemerging as a promising approach to counter antimicrobial-resistant infections, relies on a comprehensive understanding of the specificity of individual phages. Yet the significant diversity within phage populations presents a considerable challenge. Currently, there is a notable lack of tools designed for large-scale characterization of phage receptor-binding proteins, which are crucial in determining the phage host range.
In this study, we present SpikeHunter, a deep learning method based on the ESM-2 protein language model. With SpikeHunter, we identified 231,965 diverse phage-encoded tailspike proteins, a crucial determinant of phage specificity that targets bacterial polysaccharide receptors, across 787,566 bacterial genomes from 5 virulent, antibiotic-resistant pathogens. Notably, 86.60% (143,200) of these proteins exhibited strong associations with specific bacterial polysaccharides. We discovered that phages with identical tailspike proteins can infect different bacterial species with similar polysaccharide receptors, underscoring the pivotal role of tailspike proteins in determining host range. The specificity is mainly attributed to the protein\'s C-terminal domain, which strictly correlates with host specificity during domain swapping in tailspike proteins. Importantly, our dataset-driven predictions of phage-host specificity closely match the phage-host pairs observed in real-world phage therapy cases we studied.
Our research provides a rich resource, including both the method and a database derived from a large-scale genomics survey. This substantially enhances understanding of phage specificity determinants at the strain level and offers a valuable framework for guiding phage selection in therapeutic applications.
摘要:
背景:噬菌体疗法,重新成为一种有前途的方法来对抗抗菌素耐药性感染,依赖于对个体噬菌体特异性的全面理解。然而,噬菌体群体内的显著多样性提出了相当大的挑战。目前,明显缺乏用于大规模表征噬菌体受体结合蛋白的工具,这对确定噬菌体宿主范围至关重要。
结果:在这项研究中,我们介绍SpikeHunter,基于ESM-2蛋白质语言模型的深度学习方法。有了SpikeHunter,我们鉴定了231,965种不同的噬菌体编码尾穗蛋白,针对细菌多糖受体的噬菌体特异性的关键决定因素,来自5个毒力的787,566个细菌基因组,抗生素抗性病原体。值得注意的是,这些蛋白质中的86.60%(143,200)表现出与特定细菌多糖的强关联。我们发现,具有相同尾穗蛋白的噬菌体可以感染具有相似多糖受体的不同细菌物种,强调尾穗蛋白在确定宿主范围中的关键作用。特异性主要归因于蛋白质的C端结构域,这与尾穗蛋白的结构域交换过程中的宿主特异性严格相关。重要的是,我们的数据集驱动的噬菌体-宿主特异性预测与我们研究的真实世界噬菌体治疗病例中观察到的噬菌体-宿主对紧密匹配.
结论:我们的研究提供了丰富的资源,包括方法和来自大规模基因组学调查的数据库。这实质上增强了在菌株水平上对噬菌体特异性决定子的理解,并且提供了用于指导治疗应用中的噬菌体选择的有价值的框架。
公众号