关键词: insertion/deletion new sample placement phylogenetic tree recoding simulated annealing

来  源:   DOI:10.1093/ve/veae005   PDF(Pubmed)

Abstract:
Understanding phylogenetic relationships among species is essential for many biological studies, which call for an accurate phylogenetic tree to understand major evolutionary transitions. The phylogenetic analyses present a major challenge in estimation accuracy and computational efficiency, especially recently facing a wave of severe emerging infectious disease outbreaks. Here, we introduced a novel, efficient framework called Bases-dependent Rapid Phylogenetic Clustering (Bd-RPC) for new sample placement for viruses. In this study, a brand-new recoding method called Frequency Vector Recoding was implemented to approximate the phylogenetic distance, and the Phylogenetic Simulated Annealing Search algorithm was developed to match the recoded distance matrix with the phylogenetic tree. Meanwhile, the indel (insertion/deletion) was heuristically introduced to foreign sequence recognition for the first time. Here, we compared the Bd-RPC with the recent placement software (PAGAN2, EPA-ng, TreeBeST) and evaluated it in Alphacoronavirus, Alphaherpesvirinae, and Betacoronavirus by using Split and Robinson-Foulds distances. The comparisons showed that Bd-RPC maintained the highest precision with great efficiency, demonstrating good performance in new sample placement on all three virus genera. Finally, a user-friendly website (http://www.bd-rpc.xyz) is available for users to classify new samples instantly and facilitate exploration of the phylogenetic research in viruses, and the Bd-RPC is available on GitHub (http://github.com/Bin-Ma/bd-rpc).
摘要:
了解物种之间的系统发育关系对于许多生物学研究至关重要,这需要一个准确的系统发育树来理解主要的进化转变。系统发育分析在估计准确性和计算效率方面提出了重大挑战,特别是最近面临一波严重的新兴传染病暴发。这里,我们介绍了一部小说,称为基础依赖快速系统发育聚类(Bd-RPC)的有效框架,用于病毒的新样品放置。在这项研究中,一种全新的重新编码方法称为频率向量重新编码被实施以近似系统发育距离,并开发了系统发育模拟退火搜索算法,以将重新编码的距离矩阵与系统发育树进行匹配。同时,indel(插入/缺失)首次被启发式地引入到外源序列识别中。这里,我们将Bd-RPC与最近的放置软件(PAGAN2,EPA-ng,TreeBeST)并在Alphacoronavirus中对其进行了评估,阿尔法疱疹病毒科,使用Split和Robinson-Foulds距离和Betacoronavirus。比较表明,Bd-RPC以很高的效率保持了最高的精度,在所有三个病毒属的新样品放置中表现良好。最后,一个用户友好的网站(http://www.BD-RPC。xyz)可供用户立即对新样本进行分类,并促进对病毒系统发育研究的探索,并且Bd-RPC在GitHub(http://github.com/Bin-Ma/bd-rpc)上可用。
公众号