关键词: barcode identification multiomics pipeline single cell single molecule spaced seed hash

Mesh : Multiomics Reproducibility of Results Genomics Chromatin / genetics Data Analysis

来  源:   DOI:10.1093/bib/bbad343

Abstract:
Single-cell multiomics techniques have been widely applied to detect the key signature of cells. These methods have achieved a single-molecule resolution and can even reveal spatial localization. These emerging methods provide insights elucidating the features of genomic, epigenomic and transcriptomic heterogeneity in individual cells. However, they have given rise to new computational challenges in data processing. Here, we describe Single-cell Single-molecule multiple Omics Pipeline (ScSmOP), a universal pipeline for barcode-indexed single-cell single-molecule multiomics data analysis. Essentially, the C language is utilized in ScSmOP to set up spaced-seed hash table-based algorithms for barcode identification according to ligation-based barcoding data and synthesis-based barcoding data, followed by data mapping and deconvolution. We demonstrate high reproducibility of data processing between ScSmOP and published pipelines in comprehensive analyses of single-cell omics data (scRNA-seq, scATAC-seq, scARC-seq), single-molecule chromatin interaction data (ChIA-Drop, SPRITE, RD-SPRITE), single-cell single-molecule chromatin interaction data (scSPRITE) and spatial transcriptomic data from various cell types and species. Additionally, ScSmOP shows more rapid performance and is a versatile, efficient, easy-to-use and robust pipeline for single-cell single-molecule multiomics data analysis.
摘要:
单细胞多组学技术已广泛应用于检测细胞的关键特征。这些方法实现了单分子分辨率,甚至可以揭示空间定位。这些新兴的方法提供了阐明基因组特征的见解,单个细胞的表观基因组和转录组异质性。然而,它们在数据处理中带来了新的计算挑战。这里,我们描述了单细胞单分子多重组学管道(ScSmOP),条形码索引的单细胞单分子多组学数据分析的通用管道。本质上,在ScSmOP中使用C语言,根据基于连接的条形码数据和基于合成的条形码数据,建立基于间隔种子哈希表的条形码识别算法,其次是数据映射和反卷积。我们在单细胞组学数据的综合分析中证明了ScSmOP和已发表的管道之间的数据处理的高可重复性(scRNA-seq,scATAC-seq,scARC-seq),单分子染色质相互作用数据(ChIA-Drop,SPRITE,RD-SPRITE),单细胞单分子染色质相互作用数据(scSPRITE)和来自各种细胞类型和物种的空间转录组数据。此外,ScSmOP显示更快速的性能,是一种多功能的,高效,易于使用和强大的管道,用于单细胞单分子多组学数据分析。
公众号