关键词: clustering feature selection single-cell RNA-sequencing trajectory analysis

Mesh : Single-Cell Analysis / methods Cluster Analysis Humans Gene Expression Profiling / methods Algorithms Computational Biology / methods Sequence Analysis, RNA / methods RNA-Seq / methods

来  源:   DOI:10.1093/bib/bbae317   PDF(Pubmed)

Abstract:
Unsupervised feature selection is a critical step for efficient and accurate analysis of single-cell RNA-seq data. Previous benchmarks used two different criteria to compare feature selection methods: (i) proportion of ground-truth marker genes included in the selected features and (ii) accuracy of cell clustering using ground-truth cell types. Here, we systematically compare the performance of 11 feature selection methods for both criteria. We first demonstrate the discordance between these criteria and suggest using the latter. We then compare the distribution of selected genes in their means between feature selection methods. We show that lowly expressed genes exhibit seriously high coefficients of variation and are mostly excluded by high-performance methods. In particular, high-deviation- and high-expression-based methods outperform the widely used in Seurat package in clustering cells and data visualization. We further show they also enable a clear separation of the same cell type from different tissues as well as accurate estimation of cell trajectories.
摘要:
无监督特征选择是有效和准确分析单细胞RNA-seq数据的关键步骤。先前的基准使用两个不同的标准来比较特征选择方法:(i)所选特征中包含的真实标记基因的比例,以及(ii)使用真实细胞类型的细胞聚类的准确性。这里,我们系统地比较了两种标准的11种特征选择方法的性能。我们首先证明这些标准之间的不一致,并建议使用后者。然后,我们在特征选择方法之间比较所选基因的分布。我们表明,低表达的基因表现出严重的高变异系数,并且大部分被高性能方法排除在外。特别是,基于高偏差和高表达的方法在聚类细胞和数据可视化方面优于Seurat包中广泛使用的方法。我们进一步表明,它们还可以从不同的组织中清楚地分离出相同的细胞类型,并准确估计细胞轨迹。
公众号