关键词: CP: Systems biology clustering analysis double dipping feature screening heterogeneity test high-dimension data highly variable gene marker gene identification mixture model terminal exhausted T cell terminally differentiated effector memory T cell

Mesh : Single-Cell Analysis / methods Humans Cluster Analysis Gene Expression Profiling / methods Sequence Analysis, RNA / methods Biomarkers, Tumor / genetics metabolism CD8-Positive T-Lymphocytes / metabolism Cholangiocarcinoma / genetics pathology Genetic Markers / genetics

来  源:   DOI:10.1016/j.crmeth.2024.100810   PDF(Pubmed)

Abstract:
In single-cell RNA sequencing (scRNA-seq) studies, cell types and their marker genes are often identified by clustering and differentially expressed gene (DEG) analysis. A common practice is to select genes using surrogate criteria such as variance and deviance, then cluster them using selected genes and detect markers by DEG analysis assuming known cell types. The surrogate criteria can miss important genes or select unimportant genes, while DEG analysis has the selection-bias problem. We present Festem, a statistical method for the direct selection of cell-type markers for downstream clustering. Festem distinguishes marker genes with heterogeneous distribution across cells that are cluster informative. Simulation and scRNA-seq applications demonstrate that Festem can sensitively select markers with high precision and enables the identification of cell types often missed by other methods. In a large intrahepatic cholangiocarcinoma dataset, we identify diverse CD8+ T cell types and potential prognostic marker genes.
摘要:
在单细胞RNA测序(scRNA-seq)研究中,细胞类型及其标记基因通常通过聚类和差异表达基因(DEG)分析来鉴定。一种常见的做法是使用替代标准选择基因,如方差和偏差,然后使用选定的基因对它们进行聚类,并通过DEG分析检测标记,假设已知细胞类型。替代标准可以错过重要的基因或选择不重要的基因,而DEG分析存在选择偏差问题。我们介绍Festem,直接选择下游聚类的细胞类型标记的统计方法。Festem区分标记基因,这些标记基因在具有簇信息的细胞中具有异质性分布。模拟和scRNA-seq应用表明,Festem可以高精度地灵敏选择标记,并能够鉴定其他方法经常错过的细胞类型。在大型肝内胆管癌数据集中,我们鉴定了不同的CD8+T细胞类型和潜在的预后标志物基因.
公众号