关键词: dotplot heatmap modimizer sketching

Mesh : Arabidopsis / genetics Tandem Repeat Sequences / genetics Software Genome, Plant User-Computer Interface Genomics / methods

来  源:   DOI:10.1093/bioinformatics/btae493   PDF(Pubmed)

Abstract:
BACKGROUND: A common method for analyzing genomic repeats is to produce a sequence similarity matrix visualized via a dot plot. Innovative approaches such as StainedGlass have improved upon this classic visualization by rendering dot plots as a heatmap of sequence identity, enabling researchers to better visualize multi-megabase tandem repeat arrays within centromeres and other heterochromatic regions of the genome. However, computing the similarity estimates for heatmaps requires high computational overhead and can suffer from decreasing accuracy.
RESULTS: In this work, we introduce ModDotPlot, an interactive and alignment-free dot plot viewer. By approximating average nucleotide identity via a k-mer-based containment index, ModDotPlot produces accurate plots orders of magnitude faster than StainedGlass. We accomplish this through the use of a hierarchical modimizer scheme that can visualize the full 128 Mb genome of Arabidopsis thaliana in under 5 min on a laptop. ModDotPlot is bundled with a graphical user interface supporting real-time interactive navigation of entire chromosomes.
METHODS: ModDotPlot is available at https://github.com/marbl/ModDotPlot.
摘要:
背景:分析基因组重复的常用方法是产生通过点图可视化的序列相似性矩阵。诸如StainedGlass之类的创新方法通过将点图绘制为序列同一性的热图,对这种经典的可视化进行了改进。使研究人员能够更好地可视化着丝粒和基因组其他异色区域内的多兆碱基串联重复序列阵列。然而,计算热图的相似性估计需要很高的计算开销,并且可能会降低准确性。
结果:在这项工作中,我们介绍了ModDotPlot,交互式和无对齐的点图查看器。通过基于k聚体的遏制指数近似平均核苷酸同一性,ModDotPlot比StainedGlass更快地产生精确的绘图。我们通过使用分层修饰方案来实现这一点,该方案可以在笔记本电脑上5分钟内可视化拟南芥的完整128Mbp基因组。ModDotPlot与图形用户界面捆绑在一起,支持整个染色体的实时交互式导航。
方法:ModDotPlot可在https://github.com/marbl/ModDotPlot获得。
公众号