rDNA rDNA-医云文献数字医云科研云海量医学决策数据服务

rDNA 关注

rDNA

文献(1篇)

百科

视频

1 High-fidelity (repeat) consensus sequences from short reads using combined read clustering and assembly.

使用组合读段聚类和组装的来自短读段的高保真 (重复) 共有序列。影响指数 : 4.547
发表时间：Jan 2024 24
来源期刊：BMC Genomics PMID：38267856

DOI：10.1186/s12864-023-09948-4
文章类型： Journal Article

背景：尽管有许多廉价而快速的方法来生成基因组数据，良好和精确的基因组组装仍然是一个问题，尤其是重复的代表性严重不足，而且经常组装错误。由于低覆盖率的短读数已经足以代表任何给定基因组的重复景观，提出了许多提供重复识别和分类的读取聚类算法。但是怎么能值得信赖，可靠和有代表性的重复共识来自未组装的基因组？
结果：这里，我们结合了重复鉴定和基因组组装的方法来得出这些可靠的共识。我们测试了几个用例，例如(1)从非模型基因组的聚类短读取中建立共识，(2)从全基因组扩增设置，和(3)以重复为中心的特定问题，例如链接与核糖体基因的无连锁排列。在我们所有的用例中，得出的共识是稳健的和有代表性的。为了评估整体性能，我们将我们的高保真重复共识与RepeatExplorer2派生的重叠群进行比较，如果它们代表长读中发现的真实转座元素。我们的结果表明，有可能产生有用的，通过以自动化方式结合读段簇和基因组组装方法，从短读段获得可靠和值得信赖的共识。
结论：我们预计，我们的工作流程为更高效、更少的手动重复表征和注释开辟了道路。受益于所有的基因组研究，尤其是那些非模式生物。
BACKGROUND: Despite the many cheap and fast ways to generate genomic data, good and exact genome assembly is still a problem, with especially the repeats being vastly underrepresented and often misassembled. As short reads in low coverage are already sufficient to represent the repeat landscape of any given genome, many read cluster algorithms were brought forward that provide repeat identification and classification. But how can trustworthy, reliable and representative repeat consensuses be derived from unassembled genomes?
RESULTS: Here, we combine methods from repeat identification and genome assembly to derive these robust consensuses. We test several use cases, such as (1) consensus building from clustered short reads of non-model genomes, (2) from genome-wide amplification setups, and (3) specific repeat-centred questions, such as the linked vs. unlinked arrangement of ribosomal genes. In all our use cases, the derived consensuses are robust and representative. To evaluate overall performance, we compare our high-fidelity repeat consensuses to RepeatExplorer2-derived contigs and check, if they represent real transposable elements as found in long reads. Our results demonstrate that it is possible to generate useful, reliable and trustworthy consensuses from short reads by a combination from read cluster and genome assembly methods in an automatable way.
CONCLUSIONS: We anticipate that our workflow opens the way towards more efficient and less manual repeat characterization and annotation, benefitting all genome studies, but especially those of non-model organisms.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)

rDNA 关注

1 High-fidelity (repeat) consensus sequences from short reads using combined read clustering and assembly.