Mesh : 5-Methylcytosine / metabolism chemistry Nanopore Sequencing / methods DNA Methylation Animals Mice Humans CpG Islands / genetics Deep Learning Algorithms Sequence Analysis, DNA / methods Whole Genome Sequencing / methods Sulfites / chemistry

来  源:   DOI:10.1038/s41467-024-49847-0   PDF(Pubmed)

Abstract:
DNA methylation plays an important role in various biological processes, including cell differentiation, ageing, and cancer development. The most important methylation in mammals is 5-methylcytosine mostly occurring in the context of CpG dinucleotides. Sequencing methods such as whole-genome bisulfite sequencing successfully detect 5-methylcytosine DNA modifications. However, they suffer from the serious drawbacks of short read lengths and might introduce an amplification bias. Here we present Rockfish, a deep learning algorithm that significantly improves read-level 5-methylcytosine detection by using Nanopore sequencing. Rockfish is compared with other methods based on Nanopore sequencing on R9.4.1 and R10.4.1 datasets. There is an increase in the single-base accuracy and the F1 measure of up to 5 percentage points on R.9.4.1 datasets, and up to 0.82 percentage points on R10.4.1 datasets. Moreover, Rockfish shows a high correlation with whole-genome bisulfite sequencing, requires lower read depth, and achieves higher confidence in biologically important regions such as CpG-rich promoters while being computationally efficient. Its superior performance in human and mouse samples highlights its versatility for studying 5-methylcytosine methylation across varied organisms and diseases. Finally, its adaptable architecture ensures compatibility with new versions of pores and chemistry as well as modification types.
摘要:
DNA甲基化在各种生物过程中起着重要作用,包括细胞分化,老化,和癌症的发展。哺乳动物中最重要的甲基化是5-甲基胞嘧啶,主要存在于CpG二核苷酸的背景下。测序方法如全基因组亚硫酸氢盐测序成功检测5-甲基胞嘧啶DNA修饰。然而,它们具有读长短的严重缺点,并可能引入放大偏差。在这里,我们介绍石鱼,一种深度学习算法,通过使用纳米孔测序显着改善读取水平5-甲基胞嘧啶检测。在R9.4.1和R10.4.1数据集上,将Rockfish与基于纳米孔测序的其他方法进行了比较。在R.9.4.1数据集上,单基准精度和F1测量值增加了5个百分点,R10.4.1数据集上高达0.82个百分点。此外,Rockfish显示与全基因组亚硫酸氢盐测序高度相关,需要较低的读取深度,并且在诸如富含CpG的启动子的生物学重要区域中实现更高的置信度,同时是计算高效的。其在人类和小鼠样品中的卓越性能凸显了其在不同生物体和疾病中研究5-甲基胞嘧啶甲基化的多功能性。最后,其适应性架构确保与新版本的孔和化学以及修改类型的兼容性。
公众号