Imputation

Imputation
  • 文章类型: Journal Article
    背景:近年来,单细胞RNA测序(scRNA-seq)的引入使得能够以前所未有的粒度和处理速度分析细胞的转录组。应用该技术的实验结果是包含M基因和N细胞样品的聚集的mRNA表达计数的[公式:参见正文]矩阵。从这个矩阵中,科学家可以研究细胞蛋白质合成如何响应各种因素而变化,例如,疾病与非疾病状态对治疗方案的反应。这项技术的关键挑战是检测和准确记录低表达的基因。因此,低表达水平往往会被错过并记录为零-一个被称为dropout的事件。这使得低表达的基因与真正的零表达没有区别,并且与相同类型的细胞中存在的低表达不同。这个问题使得任何后续的下游分析变得困难。
    结果:为了解决这个问题,我们提出了一种使用共识聚类来测量细胞相似性的方法,并展示了一种有效且高效的算法,该算法利用这种新的相似性度量来估算scRNA-seq数据集中最可能的丢失事件。我们证明了我们的方法超过了现有插补方法的性能,同时引入了最少的新噪声,这是通过对具有已知小区身份的数据集上的性能特征进行聚类来衡量的。
    结论:ccImpute是一种有效的算法,可以纠正丢失事件,从而改善对scRNA-seq数据的下游分析。ccImpute在R中实现,可在https://github.com/khazum/ccImpute获得。
    BACKGROUND: In recent years, the introduction of single-cell RNA sequencing (scRNA-seq) has enabled the analysis of a cell\'s transcriptome at an unprecedented granularity and processing speed. The experimental outcome of applying this technology is a [Formula: see text] matrix containing aggregated mRNA expression counts of M genes and N cell samples. From this matrix, scientists can study how cell protein synthesis changes in response to various factors, for example, disease versus non-disease states in response to a treatment protocol. This technology\'s critical challenge is detecting and accurately recording lowly expressed genes. As a result, low expression levels tend to be missed and recorded as zero - an event known as dropout. This makes the lowly expressed genes indistinguishable from true zero expression and different than the low expression present in cells of the same type. This issue makes any subsequent downstream analysis difficult.
    RESULTS: To address this problem, we propose an approach to measure cell similarity using consensus clustering and demonstrate an effective and efficient algorithm that takes advantage of this new similarity measure to impute the most probable dropout events in the scRNA-seq datasets. We demonstrate that our approach exceeds the performance of existing imputation approaches while introducing the least amount of new noise as measured by clustering performance characteristics on datasets with known cell identities.
    CONCLUSIONS: ccImpute is an effective algorithm to correct for dropout events and thus improve downstream analysis of scRNA-seq data. ccImpute is implemented in R and is available at https://github.com/khazum/ccImpute .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    From the Editors: This is one in a series of statistical guidelines designed to highlight common statistical considerations in behavioral medicine research. The goal is to briefly discuss appropriate ways to analyze and present data in the International Journal of Behavioral Medicine (IJBM). Collectively the series will culminate in a set of basic statistical guidelines to be adopted by IJBM and integrated into the journal\'s official Instructions for Authors, but also to serve as an independent resource. If you have ideas for a future topic, please email the Statistical Editor Suzanne Segerstrom at segerstrom@uky.edu.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号