关键词: batch effect graph convolutional networks single-cell RNA-seq data transfer learning uterine fibroids

Mesh : Humans Gene Expression Profiling / methods Algorithms Sequence Analysis, RNA / methods Single-Cell Gene Expression Analysis Single-Cell Analysis / methods Cluster Analysis Machine Learning Leiomyoma

来  源:   DOI:10.1093/bib/bbad426   PDF(Pubmed)

Abstract:
Due to the high dimensionality and sparsity of the gene expression matrix in single-cell RNA-sequencing (scRNA-seq) data, coupled with significant noise generated by shallow sequencing, it poses a great challenge for cell clustering methods. While numerous computational methods have been proposed, the majority of existing approaches center on processing the target dataset itself. This approach disregards the wealth of knowledge present within other species and batches of scRNA-seq data. In light of this, our paper proposes a novel method named graph-based deep embedding clustering (GDEC) that leverages transfer learning across species and batches. GDEC integrates graph convolutional networks, effectively overcoming the challenges posed by sparse gene expression matrices. Additionally, the incorporation of DEC in GDEC enables the partitioning of cell clusters within a lower-dimensional space, thereby mitigating the adverse effects of noise on clustering outcomes. GDEC constructs a model based on existing scRNA-seq datasets and then applying transfer learning techniques to fine-tune the model using a limited amount of prior knowledge gleaned from the target dataset. This empowers GDEC to adeptly cluster scRNA-seq data cross different species and batches. Through cross-species and cross-batch clustering experiments, we conducted a comparative analysis between GDEC and conventional packages. Furthermore, we implemented GDEC on the scRNA-seq data of uterine fibroids. Compared results obtained from the Seurat package, GDEC unveiled a novel cell type (epithelial cells) and identified a notable number of new pathways among various cell types, thus underscoring the enhanced analytical capabilities of GDEC. Availability and implementation: https://github.com/YuzhiSun/GDEC/tree/main.
摘要:
由于单细胞RNA测序(scRNA-seq)数据中基因表达矩阵的高维性和稀疏性,再加上浅层测序产生的显著噪声,这对细胞聚类方法提出了很大的挑战。虽然已经提出了许多计算方法,现有的大多数方法都集中在处理目标数据集本身。这种方法忽视了其他物种和scRNA-seq数据批次中存在的大量知识。鉴于此,我们的论文提出了一种新的方法,称为基于图的深度嵌入聚类(GDEC),利用跨物种和批次的迁移学习。GDEC集成了图形卷积网络,有效地克服了稀疏基因表达矩阵带来的挑战。此外,DEC在GDEC中的结合使得细胞团簇在低维空间内的划分成为可能,从而减轻噪声对聚类结果的不利影响。GDEC基于现有的scRNA-seq数据集构建模型,然后应用迁移学习技术,使用从目标数据集中的有限数量的先验知识对模型进行微调。这使GDEC能够巧妙地将scRNA-seq数据跨不同的物种和批次进行聚类。通过跨物种和跨批次聚类实验,我们对GDEC和常规包装进行了比较分析。此外,我们对子宫肌瘤的scRNA-seq数据实施了GDEC.比较从Seurat包获得的结果,GDEC揭示了一种新的细胞类型(上皮细胞),并在各种细胞类型中发现了许多新的途径,从而强调了GDEC增强的分析能力。可用性和实施:https://github.com/YuzhiSun/GDEC/tree/main。
公众号