关键词: Cell cluster Downstream analysis Dropout Imputation WALS scRNA-seq

Mesh : Gene Expression Profiling Sequence Analysis, RNA / methods Base Sequence Least-Squares Analysis Single-Cell Analysis / methods Cluster Analysis Software

来  源:   DOI:10.1016/j.compbiomed.2024.108225

Abstract:
OBJECTIVE: Single-cell RNA sequencing (scRNA-seq) provides a powerful tool for exploring cellular heterogeneity, discovering novel or rare cell types, distinguishing between tissue-specific cellular composition, and understanding cell differentiation during development. However, due to technological limitations, dropout events in scRNA-seq can mistakenly convert some entries in the real data to zero. This is equivalent to introducing noise into the data of cell gene expression entries. The data is contaminated, which affects the performance of downstream analyses, including clustering, cell annotation, differential gene expression analysis, and so on. Therefore, it is a crucial work to accurately determine which zeros are due to dropout events and perform imputation operations on them.
METHODS: Considering the different confidence levels of different zeros in the gene expression matrix, this paper proposes a SinCWIm method for dropout events in scRNA-seq based on weighted alternating least squares (WALS). The method utilizes Pearson correlation coefficient and hierarchical clustering to quantify the confidence of zero entries. It is then combined with WALS for matrix decomposition. And the imputation result is made close to the actual number by outlier removal and data correction operations.
RESULTS: A total of eight single-cell sequencing datasets were used for comparative experiments to demonstrate the overall superiority of SinCWIm over state-of-the-art models. SinCWIm was applied to cluster the data to obtain an adjusted RAND index evaluation, and the Usoskin, Pollen and Bladder datasets scored 94.46%, 96.48% and 76.74%, respectively. In addition, significant improvements were made in the retention of differential expression genes and visualization.
CONCLUSIONS: SinCWIm provides a valuable imputation method for handling dropout events in single-cell sequencing data. In comparison to advanced methods, SinCWIm demonstrates excellent performance in clustering, visualization and other aspects. It is applicable to various single-cell sequencing datasets.
摘要:
目的:单细胞RNA测序(scRNA-seq)为探索细胞异质性提供了强大的工具,发现新颖或稀有的细胞类型,区分组织特异性细胞组成,了解发育过程中的细胞分化。然而,由于技术限制,scRNA-seq中的dropout事件可能会错误地将真实数据中的某些条目转换为零。这相当于将噪声引入细胞基因表达条目的数据中。数据被污染了,影响下游分析的性能,包括聚类,单元格注释,差异基因表达分析,等等。因此,准确确定哪些零是由于dropout事件引起的,并对其执行插补操作是一项至关重要的工作。
方法:考虑到基因表达矩阵中不同零的不同置信水平,本文提出了一种基于加权交替最小二乘法(WALS)的scRNA-seq中dropout事件的SinCWIm方法。该方法利用皮尔逊相关系数和层次聚类来量化零条目的置信度。然后与WALS结合进行矩阵分解。并且通过离群值去除和数据校正操作使估算结果接近实际数量。
结果:总共使用八个单细胞测序数据集进行比较实验,以证明SinCWIm优于最先进的模型。应用SinCWIm对数据进行聚类,以获得调整后的RAND指数评估,和乌索斯金,花粉和膀胱数据集得分94.46%,96.48%和76.74%,分别。此外,在差异表达基因的保留和可视化方面取得了显着改善。
结论:SinCWIm为处理单细胞测序数据中的丢失事件提供了一种有价值的归因方法。与先进的方法相比,SinCWIm在集群中展示了出色的性能,可视化和其他方面。它适用于各种单细胞测序数据集。
公众号