关键词: Dependent data False discovery rate High-dimensional biology Multiple hypothesis testing Simultaneous inference

来  源:   DOI:10.1080/01621459.2011.645777   PDF(Pubmed)

Abstract:
A growing number of modern scientific problems in areas such as genomics, neurobiology, and spatial epidemiology involve the measurement and analysis of thousands of related features that may be stochastically dependent at arbitrarily strong levels. In this work, we consider the scenario where the features follow a multivariate Normal distribution. We demonstrate that dependence is manifested as random variation shared among features, and that standard methods may yield highly unstable inference due to dependence, even when the dependence is fully parameterized and utilized in the procedure. We propose a \"cross-dimensional inference\" framework that alleviates the problems due to dependence by modeling and removing the variation shared among features, while also properly regularizing estimation across features. We demonstrate the framework on both simultaneous point estimation and multiple hypothesis testing in scenarios derived from the scientific applications of interest.
摘要:
越来越多的现代科学问题出现在基因组学等领域,神经生物学,和空间流行病学涉及对数千个相关特征的测量和分析,这些特征可能在任意强的水平上随机依赖。在这项工作中,我们考虑特征遵循多变量正态分布的情况。我们证明了依赖性表现为特征之间共享的随机变化,标准方法可能由于依赖性而产生高度不稳定的推断,即使在过程中完全参数化和利用依赖性。我们提出了一个“跨维度推理”框架,通过建模和删除特征之间共享的变化来缓解由于依赖而导致的问题,同时也适当地正则化跨特征的估计。我们演示了从感兴趣的科学应用得出的场景中同时进行点估计和多个假设检验的框架。
公众号