关键词: UMAP denoising diffusion maps high-dimensional data imputation manifold learning scRNA-seq

来  源:   DOI:10.3390/biology13070512   PDF(Pubmed)

Abstract:
Single-cell transcriptomics (scRNA-seq) is revolutionizing biological research, yet it faces challenges such as inefficient transcript capture and noise. To address these challenges, methods like neighbor averaging or graph diffusion are used. These methods often rely on k-nearest neighbor graphs from low-dimensional manifolds. However, scRNA-seq data suffer from the \'curse of dimensionality\', leading to the over-smoothing of data when using imputation methods. To overcome this, sc-PHENIX employs a PCA-UMAP diffusion method, which enhances the preservation of data structures and allows for a refined use of PCA dimensions and diffusion parameters (e.g., k-nearest neighbors, exponentiation of the Markov matrix) to minimize noise introduction. This approach enables a more accurate construction of the exponentiated Markov matrix (cell neighborhood graph), surpassing methods like MAGIC. sc-PHENIX significantly mitigates over-smoothing, as validated through various scRNA-seq datasets, demonstrating improved cell phenotype representation. Applied to a multicellular tumor spheroid dataset, sc-PHENIX identified known extreme phenotype states, showcasing its effectiveness. sc-PHENIX is open-source and available for use and modification.
摘要:
单细胞转录组学(scRNA-seq)正在彻底改变生物学研究,然而,它面临着低效的成绩单捕获和噪音等挑战。为了应对这些挑战,使用邻居平均或图形扩散等方法。这些方法通常依赖于来自低维流形的k最近邻图。然而,scRNA-seq数据遭受“维度诅咒”,导致使用插补方法时数据过度平滑。为了克服这一点,SC-PHENIX采用PCA-UMAP扩散方法,这增强了数据结构的保存,并允许精确使用PCA维度和扩散参数(例如,k-最近的邻居,马尔可夫矩阵的幂运算)以最小化噪声引入。这种方法可以更准确地构造指数马尔可夫矩阵(细胞邻域图),超越像魔术这样的方法。SC-PHENIX显著减轻了过度平滑,通过各种scRNA-seq数据集验证,证明了细胞表型表现的改善。应用于多细胞肿瘤球体数据集,SC-PHENIX鉴定出已知的极端表型状态,展示其有效性。sc-PHENIX是开源的,可用于和修改。
公众号