关键词: Bacterial genomes Optical duplicates PCR duplicates Sequencing depth Sequencing error

Mesh : Research Design Genome, Bacterial Benchmarking High-Throughput Nucleotide Sequencing

来  源:   DOI:10.1186/s12864-023-09910-4   PDF(Pubmed)

Abstract:
BACKGROUND: Parameters adversely affecting the contiguity and accuracy of the assemblies from Illumina next-generation sequencing (NGS) are well described. However, past studies generally focused on their additive effects, overlooking their potential interactions possibly exacerbating one another\'s effects in a multiplicative manner. To investigate whether or not they act interactively on de novo genome assembly quality, we simulated sequencing data for 13 bacterial reference genomes, with varying levels of error rate, sequencing depth, PCR and optical duplicate ratios.
RESULTS: We assessed the quality of assemblies from the simulated sequencing data with a number of contiguity and accuracy metrics, which we used to quantify both additive and multiplicative effects of the four parameters. We found that the tested parameters are engaged in complex interactions, exerting multiplicative, rather than additive, effects on assembly quality. Also, the ratio of non-repeated regions and GC% of the original genomes can shape how the four parameters affect assembly quality.
CONCLUSIONS: We provide a framework for consideration in future studies using de novo genome assembly of bacterial genomes, e.g. in choosing the optimal sequencing depth, balancing between its positive effect on contiguity and negative effect on accuracy due to its interaction with error rate. Furthermore, the properties of the genomes to be sequenced also should be taken into account, as they might influence the effects of error sources themselves.
摘要:
背景:详细描述了对来自Illumina下一代测序(NGS)的组件的连续性和准确性产生不利影响的参数。然而,过去的研究通常集中在它们的加性效应上,忽略它们的潜在相互作用,可能以倍增的方式加剧彼此的影响。为了调查它们是否对从头基因组组装质量起相互作用,我们模拟了13个细菌参考基因组的测序数据,随着错误率水平的变化,测序深度,PCR和光学重复比。
结果:我们从模拟的测序数据中评估了组件的质量,并使用了一些连续性和准确性指标,我们用它来量化四个参数的加性和乘法效应。我们发现测试的参数参与复杂的相互作用,发挥乘法,而不是添加剂,对装配质量的影响。此外,原始基因组的非重复区域的比率和GC%可以决定四个参数如何影响组装质量。
结论:我们提供了一个框架,供未来研究使用细菌基因组的从头基因组组装,例如,在选择最佳测序深度时,由于其与错误率的相互作用,它对连续性的积极影响和对准确性的消极影响之间的平衡。此外,还应考虑要测序的基因组的特性,因为它们可能会影响错误源本身的影响。
公众号