关键词: HGT ILS duplication gene tree outliers phylogenomics species tree supermatrix supertree

Mesh : Phylogeny Biological Evolution

来  源:   DOI:10.1093/molbev/msad234   PDF(Pubmed)

Abstract:
In phylogenomics, incongruences between gene trees, resulting from both artifactual and biological reasons, can decrease the signal-to-noise ratio and complicate species tree inference. The amount of data handled today in classical phylogenomic analyses precludes manual error detection and removal. However, a simple and efficient way to automate the identification of outliers from a collection of gene trees is still missing. Here, we present PhylteR, a method that allows rapid and accurate detection of outlier sequences in phylogenomic datasets, i.e. species from individual gene trees that do not follow the general trend. PhylteR relies on DISTATIS, an extension of multidimensional scaling to 3 dimensions to compare multiple distance matrices at once. In PhylteR, these distance matrices extracted from individual gene phylogenies represent evolutionary distances between species according to each gene. On simulated datasets, we show that PhylteR identifies outliers with more sensitivity and precision than a comparable existing method. We also show that PhylteR is not sensitive to ILS-induced incongruences, which is a desirable feature. On a biological dataset of 14,463 genes for 53 species previously assembled for Carnivora phylogenomics, we show (i) that PhylteR identifies as outliers sequences that can be considered as such by other means, and (ii) that the removal of these sequences improves the concordance between the gene trees and the species tree. Thanks to the generation of numerous graphical outputs, PhylteR also allows for the rapid and easy visual characterization of the dataset at hand, thus aiding in the precise identification of errors. PhylteR is distributed as an R package on CRAN and as containerized versions (docker and singularity).
摘要:
在系统基因组学中,基因树之间的不一致,由于人为和生物学原因,可以降低信噪比和复杂的物种树推断。当今在经典系统基因组分析中处理的数据量排除了手动错误检测和删除。然而,一个简单而有效的方法来自动识别来自基因树集合的离群值仍然缺失。这里,我们介绍PhylteR,一种方法,可以快速准确地检测系统基因组数据集中的离群序列,即来自个体基因树的物种不遵循一般趋势。Phylter依赖于DISATIS,多维缩放扩展到3维,以一次比较多个距离矩阵。在PhylteR,这些从单个基因系统发育中提取的距离矩阵表示根据每个基因的物种之间的进化距离。在模拟数据集上,我们表明,PhylteR识别异常值比现有的可比方法具有更高的灵敏度和精度。我们还表明PhylteR对ILS诱导的不一致不敏感,这是一个可取的特点。在先前为食肉动物系统发育组学组装的53个物种的14,463个基因的生物数据集上,我们证明(I)PhylteR识别为可以通过其他方式被认为是这样的异常序列,和(ii)这些序列的去除改善了基因树和物种树之间的一致性。由于产生了大量的图形输出,PhylteR还可以快速轻松地对手头的数据集进行视觉表征,从而有助于精确识别错误。PhylteR在CRAN上作为R包和容器化版本(docker和singularity)分发。
公众号