sequencing depth

测序深度
  • 文章类型: Journal Article
    IlluminaHiSeq的配对短读,MiSeq,和NovaSeq的模拟细菌群落来自新鲜菠菜和地表水在不同测序深度的计算机上产生。多药耐药的肠道沙门氏菌血清型印第安纳州被纳入菠菜社区,而水体中含有多重耐药的铜绿假单胞菌。
    Paired-end short reads of Illumina HiSeq, MiSeq, and NovaSeq of simulated bacterial communities from fresh spinach and surface water were generated in silico at various sequencing depths. Multidrug-resistant Salmonella enterica serotype Indiana was included in the spinach community, while the water community contained multidrug-resistant Pseudomonas aeruginosa.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    鸟枪宏基因组学测序实验正在发现广泛的应用。尽管如此,关于获取有意义的信息以进行分类学分析和抗微生物药物耐药基因(ARG)鉴定所需的序列数量的指南仍然有限.在这项研究中,我们在口腔微生物群的背景下探索了这个问题,通过使用非常高数量的序列(约1亿条)进行测序,四个人斑块样本,和一个微生物群落标准,并通过降采样程序评估微生物鉴定和ARGs检测的性能。当调查减少数量的序列对微生物群落标准数据集的定量分类分析的影响时,与预期相比,我们发现已确定的微生物种类及其丰度存在一些差异。这种差异在整个向下抽样中是一致的,表明它们与分类学分析方法限制的联系。总的来说,结果表明,序列的数量对宏基因组样本在定性(即,存在/不存在)信息丢失的水平,尤其是在阅读量不到4000万次的实验中,而丰度估计受到的影响最小,在低丰度物种中仅观察到微小的变化。还评估了ARGs的存在:总共鉴定了133个ARGs。值得注意的是,其中23%的结果不一致,在同一样本的下采样数据集中存在或不存在。此外,超过一半的ARG在阅读量少于2000万的数据集中丢失。这项研究强调了仔细考虑测序方面的重要性,并提出了一些设计鸟枪宏基因组学实验的指南,最终目标是最大化口腔微生物组分析。我们的研究结果表明,根据不同的研究目标,不同的优化序列号:4000万用于微生物区系分析,5000万用于低丰度物种检测,和2000万用于ARG识别。关键点:•四千万个序列是用于微生物区系分析的成本有效的解决方案•五千万个序列允许低丰度物种检测•两千万个序列被推荐用于ARG鉴定。
    Shotgun metagenomics sequencing experiments are finding a wide range of applications. Nonetheless, there are still limited guidelines regarding the number of sequences needed to acquire meaningful information for taxonomic profiling and antimicrobial resistance gene (ARG) identification. In this study, we explored this issue in the context of oral microbiota by sequencing with a very high number of sequences (~ 100 million), four human plaque samples, and one microbial community standard and by evaluating the performance of microbial identification and ARGs detection through a downsampling procedure. When investigating the impact of a decreasing number of sequences on quantitative taxonomic profiling in the microbial community standard datasets, we found some discrepancies in the identified microbial species and their abundances when compared to the expected ones. Such differences were consistent throughout downsampling, suggesting their link to taxonomic profiling methods limitations. Overall, results showed that the number of sequences has a great impact on metagenomic samples at the qualitative (i.e., presence/absence) level in terms of loss of information, especially in experiments having less than 40 million reads, whereas abundance estimation was minimally affected, with only slight variations observed in low-abundance species. The presence of ARGs was also assessed: a total of 133 ARGs were identified. Notably, 23% of them inconsistently resulted as present or absent across downsampling datasets of the same sample. Moreover, over half of ARGs were lost in datasets having less than 20 million reads. This study highlights the importance of carefully considering sequencing aspects and suggests some guidelines for designing shotgun metagenomics experiments with the final goal of maximizing oral microbiome analyses. Our findings suggest varying optimized sequence numbers according to different study aims: 40 million for microbiota profiling, 50 million for low-abundance species detection, and 20 million for ARG identification. KEY POINTS: • Forty million sequences are a cost-efficient solution for microbiota profiling • Fifty million sequences allow low-abundance species detection • Twenty million sequences are recommended for ARG identification.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    牛津纳米孔测序是促进宏基因组组装基因组(MAG)重建的高通量测序技术之一。这项研究旨在评估长读组装算法在牛津纳米孔测序中的潜力,以使用模拟和模拟社区增强基于MAG的细菌病原体鉴定。生成模拟群落以模拟新鲜菠菜和地表水中的群落。使用R9.4.1+SQK-LSK109和R10.4+SQK-LSK112产生长读数,具有0.5、1和2百万个读数。模拟的细菌群落包括多重耐药的肠道沙门氏菌血清型海德堡,蒙得维的亚,和新鲜菠菜群落中的鼠伤寒菌单独或组合,以及地表水群落中的多重耐药铜绿假单胞菌。还研究了ZymoBIOMICSHMWDNA标准的真实数据集。生物信息学管道(MAGenie,免费提供在https://github.com/jackchen129/MAGenie)结合宏基因组组装,分类学分类,并开发了序列提取来重建宏基因组组装中的MAG草案。基于一系列基因组分析评估了五个组装者。总的来说,弗莱的表现优于其他装配商,紧随其后的是沙斯塔,Raven,还有Uniculer,而Canu的表现最差。在某些情况下,提取的序列产生了MAG草案,并提供了抗菌素抗性基因和可移动遗传元件的位置和结构.我们的研究展示了利用提取的序列进行精确的系统发育推断的可行性,正如参考基因组和提取序列之间的系统发生拓扑结构的一致比对所证明的那样。在大多数情况下,R9.4.1+SQK-LSK109比R10.4+SQK-LSK112更有效,并且更大的测序深度通常导致更准确的结果。重要性通过检查不同的细菌群落,特别是那些拥有多种肠道沙门氏菌血清型的人,这项研究对于揭示长读数组装算法通过牛津纳米孔测序改善基于宏基因组组装基因组(MAG)的病原体鉴定的潜力具有重要意义.我们的研究表明,长阅读组装是提高基于MAG的病原体鉴定精度的有希望的途径,从而推进更强有力的监测措施的发展。这些发现还支持正在进行的努力,以微调生物信息学管道,以在复杂的宏基因组样品中进行准确的病原体鉴定。
    Oxford Nanopore sequencing is one of the high-throughput sequencing technologies that facilitates the reconstruction of metagenome-assembled genomes (MAGs). This study aimed to assess the potential of long-read assembly algorithms in Oxford Nanopore sequencing to enhance the MAG-based identification of bacterial pathogens using both simulated and mock communities. Simulated communities were generated to mimic those on fresh spinach and in surface water. Long reads were produced using R9.4.1+SQK-LSK109 and R10.4 + SQK-LSK112, with 0.5, 1, and 2 million reads. The simulated bacterial communities included multidrug-resistant Salmonella enterica serotypes Heidelberg, Montevideo, and Typhimurium in the fresh spinach community individually or in combination, as well as multidrug-resistant Pseudomonas aeruginosa in the surface water community. Real data sets of the ZymoBIOMICS HMW DNA Standard were also studied. A bioinformatic pipeline (MAGenie, freely available at https://github.com/jackchen129/MAGenie) that combines metagenome assembly, taxonomic classification, and sequence extraction was developed to reconstruct draft MAGs from metagenome assemblies. Five assemblers were evaluated based on a series of genomic analyses. Overall, Flye outperformed the other assemblers, followed by Shasta, Raven, and Unicycler, while Canu performed least effectively. In some instances, the extracted sequences resulted in draft MAGs and provided the locations and structures of antimicrobial resistance genes and mobile genetic elements. Our study showcases the viability of utilizing the extracted sequences for precise phylogenetic inference, as demonstrated by the consistent alignment of phylogenetic topology between the reference genome and the extracted sequences. R9.4.1+SQK-LSK109 was more effective in most cases than R10.4+SQK-LSK112, and greater sequencing depths generally led to more accurate results.IMPORTANCEBy examining diverse bacterial communities, particularly those housing multiple Salmonella enterica serotypes, this study holds significance in uncovering the potential of long-read assembly algorithms to improve metagenome-assembled genome (MAG)-based pathogen identification through Oxford Nanopore sequencing. Our research demonstrates that long-read assembly stands out as a promising avenue for boosting precision in MAG-based pathogen identification, thus advancing the development of more robust surveillance measures. The findings also support ongoing endeavors to fine-tune a bioinformatic pipeline for accurate pathogen identification within complex metagenomic samples.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:评估非侵入性产前检测(NIPT)和扩展非侵入性产前检测(NIPT-plus),以检测不同测序深度的非整倍体,并评估Z评分在预测三体21、18、13、45X中的准确性。47XXX。
    方法:将在南方医院产前诊断中心检测到的NIPT或NIPT+结果阳性的孕妇纳入本回顾性研究。2017年1月至2022年12月。收集侵入性产前诊断结果。采用Logistic回归分析研究Z评分与阳性预测值(PPV)的关系。基于接收机工作特性分析,得到了最佳截止值,计算不同组的PPVs。
    结果:我们评估了1348名阳性结果的孕妇,包括NIPT报告的930和NIPT+报告的418。NIPT报道了明显更罕见的染色体非整倍体(RCAs),对于21三体(T21),NIPT+有明显更高的PPV。Logistic回归分析显示,T21和18三体(T18)的Z评分与PPV之间存在显着关联(P<0.001)。在T21和T18的真阳性病例中,胎儿分数(FF)与Z值之间存在线性关系。对于T21,T18,13三体和47XXX,高Z评分组的PPV明显高于低Z评分组,但不是45X。
    结论:Z评分有助于评估NIPT或NIPT+结果。因此,我们建议在结果中加入Z评分和FF.通过组合Z分数,FF,和产妇年龄,临床医生可以更准确地解释NIPT结果,并改善个人咨询,以减少患者的焦虑。
    OBJECTIVE: To evaluate non-invasive prenatal testing (NIPT) and expanded non-invasive prenatal testing (NIPT-plus) for detecting aneuploidies at different sequencing depths and assess Z-score accuracy in predicting trisomies 21, 18, 13, 45X, and 47XXX.
    METHODS: Pregnancies with positive NIPT or NIPT-plus results detected at the prenatal diagnosis center of Nanfang Hospital were included in this retrospective study, between January 2017 and December 2022. Invasive prenatal diagnostic results were collected. Logistic regression analyses were used to study the relationship between Z-score and positive predictive value (PPV). Optimal cut-off values were obtained based on receiver operating characteristic analysis, and PPVs were calculated in different groups.
    RESULTS: We evaluated 1348 pregnant women with positive results, including 930 reported by NIPT and 418 reported by NIPT-plus. NIPT reported significantly more rare chromosomal aneuploidies (RCAs), and NIPT-plus had a significantly higher PPV for trisomy 21 (T21). Logistic regression analyses showed a significant association (P < 0.001) between Z-score and PPVs for T21 and trisomy 18 (T18). A linear relationship was observed between fetal fraction (FF) and Z-values in the true positive cases of T21 and T18.The high Z-score group had significantly higher PPVs than the low Z-score group for T21, T18, trisomy 13, and 47XXX, but not for 45X.
    CONCLUSIONS: The Z-score is helpful in assessing NIPT or NIPT-plus results. Therefore, we suggest including the Z-score and FF in the results. By combining the Z-score, FF, and maternal age, clinicians can interpret NIPT results more accurately and improve personal counsel to reduce patients\' anxiety.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:详细描述了对来自Illumina下一代测序(NGS)的组件的连续性和准确性产生不利影响的参数。然而,过去的研究通常集中在它们的加性效应上,忽略它们的潜在相互作用,可能以倍增的方式加剧彼此的影响。为了调查它们是否对从头基因组组装质量起相互作用,我们模拟了13个细菌参考基因组的测序数据,随着错误率水平的变化,测序深度,PCR和光学重复比。
    结果:我们从模拟的测序数据中评估了组件的质量,并使用了一些连续性和准确性指标,我们用它来量化四个参数的加性和乘法效应。我们发现测试的参数参与复杂的相互作用,发挥乘法,而不是添加剂,对装配质量的影响。此外,原始基因组的非重复区域的比率和GC%可以决定四个参数如何影响组装质量。
    结论:我们提供了一个框架,供未来研究使用细菌基因组的从头基因组组装,例如,在选择最佳测序深度时,由于其与错误率的相互作用,它对连续性的积极影响和对准确性的消极影响之间的平衡。此外,还应考虑要测序的基因组的特性,因为它们可能会影响错误源本身的影响。
    BACKGROUND: Parameters adversely affecting the contiguity and accuracy of the assemblies from Illumina next-generation sequencing (NGS) are well described. However, past studies generally focused on their additive effects, overlooking their potential interactions possibly exacerbating one another\'s effects in a multiplicative manner. To investigate whether or not they act interactively on de novo genome assembly quality, we simulated sequencing data for 13 bacterial reference genomes, with varying levels of error rate, sequencing depth, PCR and optical duplicate ratios.
    RESULTS: We assessed the quality of assemblies from the simulated sequencing data with a number of contiguity and accuracy metrics, which we used to quantify both additive and multiplicative effects of the four parameters. We found that the tested parameters are engaged in complex interactions, exerting multiplicative, rather than additive, effects on assembly quality. Also, the ratio of non-repeated regions and GC% of the original genomes can shape how the four parameters affect assembly quality.
    CONCLUSIONS: We provide a framework for consideration in future studies using de novo genome assembly of bacterial genomes, e.g. in choosing the optimal sequencing depth, balancing between its positive effect on contiguity and negative effect on accuracy due to its interaction with error rate. Furthermore, the properties of the genomes to be sequenced also should be taken into account, as they might influence the effects of error sources themselves.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    限制基因取样对生物造成的危害是稀有物种的重要考虑因素,并且已经开发了许多无损采样技术来解决淡水贻贝的这一问题。两种方法,内脏擦拭和组织活检,已经证明对DNA取样是有效的,尽管尚不清楚哪种方法更适合测序基因分型(GBS)。组织活检可能会对生物体造成过度的压力和损害,而内脏擦拭可能会减少这种伤害的机会。我们的研究比较了这两种DNA采样方法在生成unionid淡水贻贝GBS数据中的功效,德州猪头(Fusconaiaaskewi)。我们的结果发现两种方法都能产生质量序列数据,虽然有些考虑是有序的。与拭子相比,组织活检产生了更高的DNA浓度和更多的读数,尽管起始DNA浓度和产生的读数数量之间没有显着关联。擦拭产生更大的序列深度(每个序列更多的读段),而组织活检显示整个基因组的覆盖率更高(在较低的序列深度)。无论采样方法如何,主成分分析中表征的基因组变异模式都是相似的,这表明侵入性较小的拭子是在这些生物体中产生高质量GBS数据的可行选择。
    Limiting harm to organisms caused by genetic sampling is an important consideration for rare species, and a number of non-destructive sampling techniques have been developed to address this issue in freshwater mussels. Two methods, visceral swabbing and tissue biopsies, have proven to be effective for DNA sampling, though it is unclear as to which method is preferable for genotyping-by-sequencing (GBS). Tissue biopsies may cause undue stress and damage to organisms, while visceral swabbing potentially reduces the chance of such harm. Our study compared the efficacy of these two DNA sampling methods for generating GBS data for the unionid freshwater mussel, the Texas pigtoe (Fusconaia askewi). Our results find both methods generate quality sequence data, though some considerations are in order. Tissue biopsies produced significantly higher DNA concentrations and larger numbers of reads when compared with swabs, though there was no significant association between starting DNA concentration and number of reads generated. Swabbing produced greater sequence depth (more reads per sequence), while tissue biopsies revealed greater coverage across the genome (at lower sequence depth). Patterns of genomic variation as characterized in principal component analyses were similar regardless of the sampling method, suggesting that the less invasive swabbing is a viable option for producing quality GBS data in these organisms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Next-generation sequencing (NGS) has raised a growing interest in phage display research. Sequencing depth is a pivotal parameter for using NGS. In the current study, we made a side-by-side comparison of two NGS platforms with different sequencing depths, denoted as lower-throughput (LTP) and higher-throughput (HTP). The capacity of these platforms for characterization of the composition, quality, and diversity of the unselected Ph.D.TM-12 Phage Display Peptide Library was investigated. Our results indicated that HTP sequencing detects a considerably higher number of unique sequences compared to the LTP platform, thus covering a broader diversity of the library. We found a larger percentage of singletons, a smaller percentage of repeated sequences, and a greater percentage of distinct sequences in the LTP datasets. These parameters suggest a higher library quality, resulting in potentially misleading information when using LTP sequencing for such assessment. Our observations showed that HTP reveals a broader distribution of peptide frequencies, thus revealing increased heterogeneity of the library by the HTP approach and offering a comparatively higher capacity for distinguishing peptides from each other. Our analyses suggested that LTP and HTP datasets show discrepancies in their peptide composition and position-specific distribution of amino acids within the library. Taken together, these findings lead us to the conclusion that a higher sequencing depth can yield more in-depth insights into the composition of the library and provide a more complete picture of the quality and diversity of phage display peptide libraries.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Randomized Controlled Trial
    未经批准:在过去的十年中,系统疫苗学领域已经出现,其中高通量转录组学和其他组学测定用于探测对疫苗接种反应的先天和适应性免疫系统的变化。这项研究的目的是在多位点的背景下对RNA测序(RNA-seq)的关键技术和分析参数进行基准测试。双盲随机疫苗临床试验。
    UNASSIGNED:我们收集了10名受试者在用减毒活的土力弗朗西斯菌疫苗接种前后的纵向外周血单核细胞(PBMC)样品,并使用来自同一样品的等分试样在两个不同位点进行RNA-Seq,以生成两个重复数据集(每个50个样品的5个时间点)。我们评估了(i)过滤低表达基因的影响,(ii)使用外部RNA对照,(iii)倍数变化和错误发现率(FDR)过滤,(iv)读取长度,和(v)重复数据集之间的差异表达基因(DEGs)一致性的测序深度。使用合成的mRNA刺突蛋白,我们开发了一种根据经验建立最小读取计数阈值的方法,以在每个实验的基础上保持倍数变化的准确性.我们通过汇集序列数据定义了参考PBMC转录组,并建立了测序深度和基因过滤对转录组表示的影响。最后,我们对一系列样本大小的DEG检测统计能力进行了建模,效果大小,和排序深度。
    UNASSIGNED:我们的结果表明,(i)建议过滤低表达的基因以提高倍数变化的准确性和位点间的一致性,如果可能,通过mRNA尖峰蛋白(ii)阅读长度对DEG检测没有重大影响,(iii)对DEG检测应用倍数变化截止值减少了内部一致性,应谨慎使用,如果有的话,(iv)测序深度的减少对统计能力的影响最小,但减少了PBMC转录组的可识别部分,(V)样本量后,效应大小(即倍数变化的大小)是检测DEG的统计能力的最重要驱动因素。这项研究的结果为规划未来的类似疫苗研究提供了RNA测序基准和指南。
    Over the last decade, the field of systems vaccinology has emerged, in which high throughput transcriptomics and other omics assays are used to probe changes of the innate and adaptive immune system in response to vaccination. The goal of this study was to benchmark key technical and analytical parameters of RNA sequencing (RNA-seq) in the context of a multi-site, double-blind randomized vaccine clinical trial.
    We collected longitudinal peripheral blood mononuclear cell (PBMC) samples from 10 subjects before and after vaccination with a live attenuated Francisella tularensis vaccine and performed RNA-Seq at two different sites using aliquots from the same sample to generate two replicate datasets (5 time points for 50 samples each). We evaluated the impact of (i) filtering lowly-expressed genes, (ii) using external RNA controls, (iii) fold change and false discovery rate (FDR) filtering, (iv) read length, and (v) sequencing depth on differential expressed genes (DEGs) concordance between replicate datasets. Using synthetic mRNA spike-ins, we developed a method for empirically establishing minimal read-count thresholds for maintaining fold change accuracy on a per-experiment basis. We defined a reference PBMC transcriptome by pooling sequence data and established the impact of sequencing depth and gene filtering on transcriptome representation. Lastly, we modeled statistical power to detect DEGs for a range of sample sizes, effect sizes, and sequencing depths.
    Our results showed that (i) filtering lowly-expressed genes is recommended to improve fold-change accuracy and inter-site agreement, if possible guided by mRNA spike-ins (ii) read length did not have a major impact on DEG detection, (iii) applying fold-change cutoffs for DEG detection reduced inter-set agreement and should be used with caution, if at all, (iv) reduction in sequencing depth had a minimal impact on statistical power but reduced the identifiable fraction of the PBMC transcriptome, (v) after sample size, effect size (i.e. the magnitude of fold change) was the most important driver of statistical power to detect DEG. The results from this study provide RNA sequencing benchmarks and guidelines for planning future similar vaccine studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    (1)背景:肿瘤分析可以预测患者的生存。在基于来自队列的肿瘤谱设计研究时要校准的两个基本参数是RNA-seq技术的测序深度和患者数量。这种校准是在成本限制下进行的,必须找到折衷方案。在生存数据的背景下,这项工作的目的是对患者数量以及miRNA-seq和mRNA-seq的测序深度对具有弹性净惩罚和随机生存森林的Cox模型的预测能力的影响进行基准分析.(2)结果:我们首先证明了Cox模型和随机生存森林提供了相当的预测能力,对某些癌症有显著差异。第二,我们证明miRNA和/或mRNA数据比单独的临床数据改善了预测。mRNA-seq数据导致比miRNA-seq略好的预测,值得注意的是,肺腺癌的肿瘤miRNA谱显示出较高的预测能力。第三,我们证明,对于大多数研究的癌症,RNA-seq数据的测序深度可以降低,而不会降低预测能力,允许以较低的成本创建独立的验证集。最后,我们表明,对于Cox模型和随机生存森林,可以减少训练数据集中的患者数量,允许对不同的患者亚组使用不同的模型。
    (1) Background: tumor profiling enables patient survival prediction. The two essential parameters to be calibrated when designing a study based on tumor profiles from a cohort are the sequencing depth of RNA-seq technology and the number of patients. This calibration is carried out under cost constraints, and a compromise has to be found. In the context of survival data, the goal of this work is to benchmark the impact of the number of patients and of the sequencing depth of miRNA-seq and mRNA-seq on the predictive capabilities for both the Cox model with elastic net penalty and random survival forest. (2) Results: we first show that the Cox model and random survival forest provide comparable prediction capabilities, with significant differences for some cancers. Second, we demonstrate that miRNA and/or mRNA data improve prediction over clinical data alone. mRNA-seq data leads to slightly better prediction than miRNA-seq, with the notable exception of lung adenocarcinoma for which the tumor miRNA profile shows higher predictive power. Third, we demonstrate that the sequencing depth of RNA-seq data can be reduced for most of the investigated cancers without degrading the prediction abilities, allowing the creation of independent validation sets at a lower cost. Finally, we show that the number of patients in the training dataset can be reduced for the Cox model and random survival forest, allowing the use of different models on different patient subgroups.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    用于产生SNP数据的全基因组测序越来越多地用于群体遗传研究。然而,获得大量样本的基因组仍然不在许多研究人员的预算之内。因此,必须选择合适的参考基因组和测序深度,以确保特定研究问题的结果的准确性,同时平衡成本和可行性。为了评估参考基因组和测序深度的选择对下游分析的影响,我们使用了五个可变相关性的家族参考基因组和三个测序深度水平(3.5×,7.5×和12×)在对两种caddisfly物种的种群基因组研究中:喜马拉雅和西藏。使用这30个数据集(五个参考基因组×三个深度×两个目标物种),我们估计了群体遗传指数(近交系数,核苷酸多样性,成对FST,和FST的全基因组分布)基于变体和基于基因型似然估计的群体结构(PCA和混合物)。结果表明,远相关的参考基因组和较低的测序深度都会导致分辨率下降。此外,选择一个更密切相关的参考基因组可以显著弥补低深度造成的缺陷。因此,我们得出结论,群体遗传研究将受益于密切相关的参考基因组,特别是随着获得高质量参考基因组的成本不断降低。然而,为了确定特定人群基因组研究的成本效益策略,可以考虑参考基因组相关性和测序深度之间的权衡。
    Whole genome sequencing for generating SNP data is increasingly used in population genetic studies. However, obtaining genomes for massive numbers of samples is still not within the budgets of many researchers. It is thus imperative to select an appropriate reference genome and sequencing depth to ensure the accuracy of the results for a specific research question, while balancing cost and feasibility. To evaluate the effect of the choice of the reference genome and sequencing depth on downstream analyses, we used five confamilial reference genomes of variable relatedness and three levels of sequencing depth (3.5×, 7.5× and 12×) in a population genomic study on two caddisfly species: Himalopsyche digitata and H. tibetana. Using these 30 datasets (five reference genomes × three depths × two target species), we estimated population genetic indices (inbreeding coefficient, nucleotide diversity, pairwise F ST, and genome-wide distribution of F ST) based on variants and population structure (PCA and admixture) based on genotype likelihood estimates. The results showed that both distantly related reference genomes and lower sequencing depth lead to degradation of resolution. In addition, choosing a more closely related reference genome may significantly remedy the defects caused by low depth. Therefore, we conclude that population genetic studies would benefit from closely related reference genomes, especially as the costs of obtaining a high-quality reference genome continue to decrease. However, to determine a cost-efficient strategy for a specific population genomic study, a trade-off between reference genome relatedness and sequencing depth can be considered.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号