关键词: All of Us Research Program Data quality Electronic Health Records Measurement error

Mesh : Humans Body Height Body Weight Electronic Health Records Algorithms Male Adult Female Middle Aged United States Reference Values Aged Young Adult

来  源:   DOI:10.1016/j.jbi.2024.104660

Abstract:
BACKGROUND: Electronic Health Records (EHR) are a useful data source for research, but their usability is hindered by measurement errors. This study investigated an automatic error detection algorithm for adult height and weight measurements in EHR for the All of Us Research Program (All of Us).
METHODS: We developed reference charts for adult heights and weights that were stratified on participant sex. Our analysis included 4,076,534 height and 5,207,328 wt measurements from ∼ 150,000 participants. Errors were identified using modified standard deviation scores, differences from their expected values, and significant changes between consecutive measurements. We evaluated our method with chart-reviewed heights (8,092) and weights (9,039) from 250 randomly selected participants and compared it with the current cleaning algorithm in All of Us.
RESULTS: The proposed algorithm classified 1.4 % of height and 1.5 % of weight errors in the full cohort. Sensitivity was 90.4 % (95 % CI: 79.0-96.8 %) for heights and 65.9 % (95 % CI: 56.9-74.1 %) for weights. Precision was 73.4 % (95 % CI: 60.9-83.7 %) for heights and 62.9 (95 % CI: 54.0-71.1 %) for weights. In comparison, the current cleaning algorithm has inferior performance in sensitivity (55.8 %) and precision (16.5 %) for height errors while having higher precision (94.0 %) and lower sensitivity (61.9 %) for weight errors.
CONCLUSIONS: Our proposed algorithm outperformed in detecting height errors compared to weights. It can serve as a valuable addition to the current All of Us cleaning algorithm for identifying erroneous height values.
摘要:
背景:电子健康记录(EHR)是研究的有用数据源,但是它们的可用性受到测量误差的阻碍。这项研究调查了EHR中成人身高和体重测量的自动错误检测算法,适用于所有人研究计划(AllofUs)。
方法:我们开发了成人身高和体重的参考图表,并根据参与者的性别进行分层。我们的分析包括4,076,534个身高和5,207,328个重量的测量值,来自150,000名参与者。使用修改后的标准偏差分数识别错误,与预期值的差异,以及连续测量之间的显著变化。我们使用来自250名随机选择的参与者的图表审查高度(8,092)和重量(9,039)评估了我们的方法,并将其与“我们所有人”中的当前清洁算法进行了比较。
结果:所提出的算法对整个队列中身高的1.4%和体重的1.5%进行了分类。身高敏感性为90.4%(95%CI:79.0-96.8%),体重敏感性为65.9%(95%CI:56.9-74.1%)。身高的精确度为73.4%(95%CI:60.9-83.7%),体重的精确度为62.9(95%CI:54.0-71.1%)。相比之下,当前的清洁算法在高度误差的灵敏度(55.8%)和精度(16.5%)方面表现较差,而在重量误差方面具有较高的精度(94.0%)和较低的灵敏度(61.9%)。
结论:我们提出的算法在检测身高误差方面优于权重。它可以作为当前AllofUs清洁算法的有价值的补充,用于识别错误的高度值。
公众号