关键词: Flow cytometry aggregation tree distribution difference false discovery proportion (FDP) multiple testing

来  源:   DOI:10.1214/22-aoas1645   PDF(Pubmed)

Abstract:
In immunology studies, flow cytometry is a commonly used multivariate single-cell assay. One key goal in flow cytometry analysis is to detect the immune cells responsive to certain stimuli. Statistically, this problem can be translated into comparing two protein expression probability density functions (pdfs) before and after the stimulus; the goal is to pinpoint the regions where these two pdfs differ. Further screening of these differential regions can be performed to identify enriched sets of responsive cells. In this paper, we model identifying differential density regions as a multiple testing problem. First, we partition the sample space into small bins. In each bin, we form a hypothesis to test the existence of differential pdfs. Second, we develop a novel multiple testing method, called TEAM (Testing on the Aggregation tree Method), to identify those bins that harbor differential pdfs while controlling the false discovery rate (FDR) under the desired level. TEAM embeds the testing procedure into an aggregation tree to test from fine- to coarse-resolution. The procedure achieves the statistical goal of pinpointing density differences to the smallest possible regions. TEAM is computationally efficient, capable of analyzing large flow cytometry data sets in much shorter time compared with competing methods. We applied TEAM and competing methods on a flow cytometry data set to identify T cells responsive to the cytomegalovirus (CMV)-pp65 antigen stimulation. With additional downstream screening, TEAM successfully identified enriched sets containing monofunctional, bifunctional, and polyfunctional T cells. Competing methods either did not finish in a reasonable time frame or provided less interpretable results. Numerical simulations and theoretical justifications demonstrate that TEAM has asymptotically valid, powerful, and robust performance. Overall, TEAM is a computationally efficient and statistically powerful algorithm that can yield meaningful biological insights in flow cytometry studies.
摘要:
在免疫学研究中,流式细胞术是一种常用的多变量单细胞检测方法。流式细胞术分析的一个关键目标是检测对某些刺激有反应的免疫细胞。统计上,这个问题可以转化为比较刺激前后的两个蛋白质表达概率密度函数(pdfs);目标是确定这两个pdfs不同的区域。可以进行这些差异区域的进一步筛选以鉴定富集的响应细胞组。在本文中,我们将识别差异密度区域建模为多重测试问题。首先,我们将样本空间分成小的箱子。在每个垃圾箱中,我们形成了一个假设来检验微分pdfs的存在。第二,我们开发了一种新颖的多重测试方法,称为TEAM(聚合树方法上的测试),在将错误发现率(FDR)控制在所需水平下的同时,识别那些含有差异PDF的垃圾箱。TEAM将测试程序嵌入到聚合树中,以从精细分辨率到粗略分辨率进行测试。该过程实现了将密度差异精确定位到最小可能区域的统计目标。团队的计算效率很高,与竞争方法相比,能够在更短的时间内分析大型流式细胞术数据集。我们将TEAM和竞争方法应用于流式细胞术数据集以鉴定响应巨细胞病毒(CMV)-pp65抗原刺激的T细胞。通过额外的下游筛选,团队成功地确定了含有单官能的富集集,双功能,和多功能T细胞。竞争方法要么没有在合理的时间范围内完成,要么提供的结果解释性较差。数值模拟和理论证明,TEAM具有渐近有效性,强大,和强大的性能。总的来说,TEAM是一种计算高效且统计强大的算法,可以在流式细胞术研究中产生有意义的生物学见解。
公众号