qpAdm

  • 文章类型: Journal Article
    我们对人类进化史的认识已被古基因组学大大推进。自2020年以来,古代DNA的研究越来越集中在重建最近的过去。然而,在人口统计复杂性增加和遗传分化减少的情况下,古基因组学方法在解决历史和考古重要性问题方面的准确性仍然是一个悬而未决的问题。我们评估了两种常用方法的性能和行为,qpAdm和f3统计量,关于人口统计模型和数据条件多样性下的混合推断。我们进行了两种互补的模拟方法-首先在四个简单的人口统计模型下探索广泛的人口统计参数空间,这些模型具有不同的复杂性和配置,使用来自两个染色体的分支长度数据-其次,我们分析了一个由59个种群组成的欧亚历史模型,使用全基因组数据,这些数据是用古老的DNA条件(如SNP确定)修改的,数据缺失,和伪单倍体化。我们观察到人口分化是驱动qpAdm表现的主要因素。值得注意的是,虽然复杂的基因流历史会影响哪些模型被归类为合理的,它们不会降低整体性能。在反映历史时期的条件下,qpAdm最频繁地将真实模型识别为在一小组密切相关的群体中合理的。为了增加解决精细比例假设的效用,我们提供了一种启发式方法,用于进一步区分包含qpAdm模型P值和f3统计量的候选模型。最后,我们证明了使用全基因组分支长度f2统计量的qpAdm的性能显着增加,强调了改善人口统计学推断的潜力,这可以通过未来f统计估计的进步来实现。
    Our knowledge of human evolutionary history has been greatly advanced by paleogenomics. Since the 2020s, the study of ancient DNA has increasingly focused on reconstructing the recent past. However, the accuracy of paleogenomic methods in resolving questions of historical and archaeological importance amidst the increased demographic complexity and decreased genetic differentiation remains an open question. We evaluated the performance and behavior of two commonly used methods, qpAdm and the f3-statistic, on admixture inference under a diversity of demographic models and data conditions. We performed two complementary simulation approaches-firstly exploring a wide demographic parameter space under four simple demographic models of varying complexities and configurations using branch-length data from two chromosomes-and secondly, we analyzed a model of Eurasian history composed of 59 populations using whole-genome data modified with ancient DNA conditions such as SNP ascertainment, data missingness, and pseudohaploidization. We observe that population differentiation is the primary factor driving qpAdm performance. Notably, while complex gene flow histories influence which models are classified as plausible, they do not reduce overall performance. Under conditions reflective of the historical period, qpAdm most frequently identifies the true model as plausible among a small candidate set of closely related populations. To increase the utility for resolving fine-scaled hypotheses, we provide a heuristic for further distinguishing between candidate models that incorporates qpAdm model P-values and f3-statistics. Finally, we demonstrate a significant performance increase for qpAdm using whole-genome branch-length f2-statistics, highlighting the potential for improved demographic inference that could be achieved with future advancements in f-statistic estimations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Preprint
    古基因组学扩展了我们对人类进化史的认识。自2020年以来,古代DNA的研究增加了对重建最近过去的关注。然而,在人口统计学复杂性增加和遗传分化减少的情况下,古基因组学方法在回答历史和考古重要性问题方面的准确性仍然是一个悬而未决的问题。我们使用了两种模拟方法来评估常用方法的局限性和行为,qpAdm和f3-统计量,关于混合推断。第一个是基于从不同复杂性和配置的四个简单人口统计学模型模拟的分支长度数据。第二个,使用全基因组数据对59个种群组成的欧亚历史进行分析,这些数据是用古老的DNA条件(如SNP确定)进行修改的,数据缺失,和伪单倍体化。我们表明,在类似历史人口的条件下,qpAdm可以识别与它们密切相关的真实来源和群体的小候选集。然而,在典型的古代DNA条件下,qpAdm无法进一步区分它们,限制了其解决精细假设的效用。值得注意的是,我们发现,复杂的基因流历史通常会导致qpAdm性能的改善,并且在混合物权重的估计中没有偏差。我们为混合推断提供了一种启发式方法,该方法结合了混合权重估计和qpAdm模型的P值,和f3-统计信息,以增强区分多个似是而非的候选人的能力。最后,我们通过全基因组分支长度F2统计来强调qpAdm的未来潜力,证明了人口统计学推断的改进,这可以通过f统计量估计的改进来实现。
    Paleogenomics has expanded our knowledge of human evolutionary history. Since the 2020s, the study of ancient DNA has increased its focus on reconstructing the recent past. However, the accuracy of paleogenomic methods in answering questions of historical and archaeological importance amidst the increased demographic complexity and decreased genetic differentiation within the historical period remains an open question. We used two simulation approaches to evaluate the limitations and behavior of commonly used methods, qpAdm and the f3-statistic, on admixture inference. The first is based on branch-length data simulated from four simple demographic models of varying complexities and configurations. The second, an analysis of Eurasian history composed of 59 populations using whole-genome data modified with ancient DNA conditions such as SNP ascertainment, data missingness, and pseudo-haploidization. We show that under conditions resembling historical populations, qpAdm can identify a small candidate set of true sources and populations closely related to them. However, in typical ancient DNA conditions, qpAdm is unable to further distinguish between them, limiting its utility for resolving fine-scaled hypotheses. Notably, we find that complex gene-flow histories generally lead to improvements in the performance of qpAdm and observe no bias in the estimation of admixture weights. We offer a heuristic for admixture inference that incorporates admixture weight estimate and P-values of qpAdm models, and f3-statistics to enhance the power to distinguish between multiple plausible candidates. Finally, we highlight the future potential of qpAdm through whole-genome branch-length f2-statistics, demonstrating the improved demographic inference that could be achieved with advancements in f-statistic estimations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    qpAdm is a statistical tool for studying the ancestry of populations with histories that involve admixture between two or more source populations. Using qpAdm, it is possible to identify plausible models of admixture that fit the population history of a group of interest and to calculate the relative proportion of ancestry that can be ascribed to each source population in the model. Although qpAdm is widely used in studies of population history of human (and nonhuman) groups, relatively little has been done to assess its performance. We performed a simulation study to assess the behavior of qpAdm under various scenarios in order to identify areas of potential weakness and establish recommended best practices for use. We find that qpAdm is a robust tool that yields accurate results in many cases, including when data coverage is low, there are high rates of missing data or ancient DNA damage, or when diploid calls cannot be made. However, we caution against co-analyzing ancient and present-day data, the inclusion of an extremely large number of reference populations in a single model, and analyzing population histories involving extended periods of gene flow. We provide a user guide suggesting best practices for the use of qpAdm.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号