关键词: case-control study contaminated case pool electronic health records imputation robustness to model misspecification

Mesh : Humans Electronic Health Records Computer Simulation Case-Control Studies Models, Statistical

来  源:   DOI:10.1111/biom.13721   PDF(Pubmed)

Abstract:
We consider analyses of case-control studies assembled from electronic health records (EHRs) where the pool of cases is contaminated by patients who are ineligible for the study. These ineligible patients, referred to as \"false cases,\" should be excluded from the analyses if known. However, the true outcome status of a patient in the case pool is unknown except in a subset whose size may be arbitrarily small compared to the entire pool. To effectively remove the influence of the false cases on estimating odds ratio parameters defined by a working association model of the logistic form, we propose a general strategy to adaptively impute the unknown case status without requiring a correct phenotyping model to help discern the true and false case statuses. Our method estimates the target parameters as the solution to a set of unbiased estimating equations constructed using all available data. It outperforms existing methods by achieving robustness to mismodeling the relationship between the outcome status and covariates of interest, as well as improved estimation efficiency. We further show that our estimator is root-n-consistent and asymptotically normal. Through extensive simulation studies and analysis of real EHR data, we demonstrate that our method has desirable robustness to possible misspecification of both the association and phenotyping models, along with statistical efficiency superior to the competitors.
摘要:
我们考虑从电子健康记录(EHR)收集的病例对照研究的分析,其中病例库被不符合研究条件的患者污染。这些不合格的病人,被称为“虚假案例”,如果已知,则应将\"从分析中排除。然而,病例池中患者的真实结局状态是未知的,除了一个子集的大小与整个病例池相比可能任意小.为了有效地消除假案对估计由逻辑形式的工作关联模型定义的赔率比参数的影响,我们提出了一种通用策略,可以自适应地推算未知病例状态,而不需要正确的表型模型来帮助辨别真假病例状态。我们的方法将目标参数估计为使用所有可用数据构建的一组无偏估计方程的解。通过实现对结果状态和感兴趣的协变量之间的关系进行错误建模的鲁棒性,它优于现有方法,以及提高估计效率。我们进一步证明了我们的估计量是根-n一致的和渐近正态的。通过广泛的模拟研究和对真实EHR数据的分析,我们证明了我们的方法对关联和表型模型的可能错误指定都具有理想的鲁棒性,统计效率优于竞争对手。
公众号