多目标优化模糊聚类从单细胞表达谱中检测细胞簇 [J].Multi-Objective Optimized Fuzzy Clustering for Detecting Cell Clusters from Single-Cell Expression Profiles.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Rapid advance in single-cell RNA sequencing (scRNA-seq) allows measurement of the expression of genes at single-cell resolution in complex disease or tissue. While many methods have been developed to detect cell clusters from the scRNA-seq data, this task currently remains a main challenge. We proposed a multi-objective optimization-based fuzzy clustering approach for detecting cell clusters from scRNA-seq data. First, we conducted initial filtering and SCnorm normalization. We considered various case studies by selecting different cluster numbers ( c l = 2 to a user-defined number), and applied fuzzy c-means clustering algorithm individually. From each case, we evaluated the scores of four cluster validity index measures, Partition Entropy ( P E ), Partition Coefficient ( P C ), Modified Partition Coefficient ( M P C ), and Fuzzy Silhouette Index ( F S I ). Next, we set the first measure as minimization objective (↓) and the remaining three as maximization objectives (↑), and then applied a multi-objective decision-making technique, TOPSIS, to identify the best optimal solution. The best optimal solution (case study) that had the highest TOPSIS score was selected as the final optimal clustering. Finally, we obtained differentially expressed genes (DEGs) using Limma through the comparison of expression of the samples between each resultant cluster and the remaining clusters. We applied our approach to a scRNA-seq dataset for the rare intestinal cell type in mice [GEO ID: GSE62270, 23,630 features (genes) and 288 cells]. The optimal cluster result (TOPSIS optimal score= 0.858) comprised two clusters, one with 115 cells and the other 91 cells. The evaluated scores of the four cluster validity indices, F S I , P E , P C , and M P C for the optimized fuzzy clustering were 0.482, 0.578, 0.607, and 0.215, respectively. The Limma analysis identified 1240 DEGs (cluster 1 vs. cluster 2). The top ten gene markers were Rps21, Slc5a1, Crip1, Rpl15, Rpl3, Rpl27a, Khk, Rps3a1, Aldob and Rps17. In this list, Khk (encoding ketohexokinase) is a novel marker for the rare intestinal cell type. In summary, this method is useful to detect cell clusters from scRNA-seq data.

摘要：

单细胞RNA测序（scRNA-seq）的快速发展允许在复杂疾病或组织中以单细胞分辨率测量基因的表达。虽然已经开发了许多方法来从scRNA-seq数据中检测细胞簇，这项任务目前仍然是一项主要挑战。我们提出了一种基于多目标优化的模糊聚类方法，用于从scRNA-seq数据中检测细胞簇。首先,我们进行了初始滤波和SCnorm归一化。我们通过选择不同的群集编号（cl=2到用户定义的编号）考虑了各种案例研究，并分别应用了模糊c均值聚类算法。从每个案例中，我们评估了四个聚类效度指标的得分，分区熵(PE)，分区系数（PC），修正的分区系数(MPC)，和模糊剪影指数(FSI)。接下来,我们将第一个度量设置为最小化目标(丨)，其余三个作为最大化目标(^)，然后应用了多目标决策技术，TOPSIS,找出最佳的最优解。选择TOPSIS得分最高的最佳解决方案（案例研究）作为最终的最佳聚类。最后,我们通过比较每个合成簇和其余簇之间的样品表达，使用Limma获得了差异表达基因（DEGs）。我们将我们的方法应用于小鼠中罕见肠细胞类型的scRNA-seq数据集[GEOID:GSE62270,23,630个特征(基因)和288个细胞]。最优聚类结果(TOPSIS最优分数=0.858)包括两个聚类，一个有115个细胞，其他91个细胞。四个聚类效度指数的评估得分，FSI,PE,PC,优化后的模糊聚类的MPC分别为0.482、0.578、0.607和0.215。Limma分析确定了1240个DEG(第1组与集群2)。前十位基因标记分别为Rps21、Slc5a1、Crip1、Rpl15、Rpl3、Rpl27a,Khk,Rps3a1，Aldob和Rps17。在这个列表中,Khk（编码酮己糖激酶）是罕见肠细胞类型的新型标记。总之,该方法可用于从scRNA-seq数据中检测细胞簇。