关键词: aquatic insects de novo genomes population genomics reference genomes sequencing depth whole genome resequencing

来  源:   DOI:10.1002/ece3.9583   PDF(Pubmed)

Abstract:
Whole genome sequencing for generating SNP data is increasingly used in population genetic studies. However, obtaining genomes for massive numbers of samples is still not within the budgets of many researchers. It is thus imperative to select an appropriate reference genome and sequencing depth to ensure the accuracy of the results for a specific research question, while balancing cost and feasibility. To evaluate the effect of the choice of the reference genome and sequencing depth on downstream analyses, we used five confamilial reference genomes of variable relatedness and three levels of sequencing depth (3.5×, 7.5× and 12×) in a population genomic study on two caddisfly species: Himalopsyche digitata and H. tibetana. Using these 30 datasets (five reference genomes × three depths × two target species), we estimated population genetic indices (inbreeding coefficient, nucleotide diversity, pairwise F ST, and genome-wide distribution of F ST) based on variants and population structure (PCA and admixture) based on genotype likelihood estimates. The results showed that both distantly related reference genomes and lower sequencing depth lead to degradation of resolution. In addition, choosing a more closely related reference genome may significantly remedy the defects caused by low depth. Therefore, we conclude that population genetic studies would benefit from closely related reference genomes, especially as the costs of obtaining a high-quality reference genome continue to decrease. However, to determine a cost-efficient strategy for a specific population genomic study, a trade-off between reference genome relatedness and sequencing depth can be considered.
摘要:
用于产生SNP数据的全基因组测序越来越多地用于群体遗传研究。然而,获得大量样本的基因组仍然不在许多研究人员的预算之内。因此,必须选择合适的参考基因组和测序深度,以确保特定研究问题的结果的准确性,同时平衡成本和可行性。为了评估参考基因组和测序深度的选择对下游分析的影响,我们使用了五个可变相关性的家族参考基因组和三个测序深度水平(3.5×,7.5×和12×)在对两种caddisfly物种的种群基因组研究中:喜马拉雅和西藏。使用这30个数据集(五个参考基因组×三个深度×两个目标物种),我们估计了群体遗传指数(近交系数,核苷酸多样性,成对FST,和FST的全基因组分布)基于变体和基于基因型似然估计的群体结构(PCA和混合物)。结果表明,远相关的参考基因组和较低的测序深度都会导致分辨率下降。此外,选择一个更密切相关的参考基因组可以显著弥补低深度造成的缺陷。因此,我们得出结论,群体遗传研究将受益于密切相关的参考基因组,特别是随着获得高质量参考基因组的成本不断降低。然而,为了确定特定人群基因组研究的成本效益策略,可以考虑参考基因组相关性和测序深度之间的权衡。
公众号