关键词: Analysis-ready data Exploratory data analysis Laboratory records R package

来  源:   DOI:10.7717/peerj-cs.1528   PDF(Pubmed)

Abstract:
UNASSIGNED: Electronic health records (EHRs) play a crucial role in healthcare decision-making by giving physicians insights into disease progression and suitable treatment options. Within EHRs, laboratory test results are frequently utilized for predicting disease progression. However, processing laboratory test results often poses challenges due to variations in units and formats. In addition, leveraging the temporal information in EHRs can improve outcomes, prognoses, and diagnosis predication. Nevertheless, the irregular frequency of the data in these records necessitates data preprocessing, which can add complexity to time-series analyses.
UNASSIGNED: To address these challenges, we developed an open-source R package that facilitates the extraction of temporal information from laboratory records. The proposed lab package generates analysis-ready time series data by segmenting the data into time-series windows and imputing missing values. Moreover, users can map local laboratory codes to the Logical Observation Identifier Names and Codes (LOINC), an international standard. This mapping allows users to incorporate additional information, such as reference ranges and related diseases. Moreover, the reference ranges provided by LOINC enable us to categorize results into normal or abnormal. Finally, the analysis-ready time series data can be further summarized using descriptive statistics and utilized to develop models using machine learning technologies.
UNASSIGNED: Using the lab package, we analyzed data from MIMIC-III, focusing on newborns with patent ductus arteriosus (PDA). We extracted time-series laboratory records and compared the differences in test results between patients with and without 30-day in-hospital mortality. We then identified significant variations in several laboratory test results 7 days after PDA diagnosis. Leveraging the time series-analysis-ready data, we trained a prediction model with the long short-term memory algorithm, achieving an area under the receiver operating characteristic curve of 0.83 for predicting 30-day in-hospital mortality in model training. These findings demonstrate the lab package\'s effectiveness in analyzing disease progression.
UNASSIGNED: The proposed lab package simplifies and expedites the workflow involved in laboratory records extraction. This tool is particularly valuable in assisting clinical data analysts in overcoming the obstacles associated with heterogeneous and sparse laboratory records.
摘要:
电子健康记录(EHR)通过为医生提供有关疾病进展和适当治疗方案的见解,在医疗保健决策中发挥着至关重要的作用。在EHR内,实验室检测结果经常用于预测疾病进展。然而,由于单位和格式的变化,处理实验室测试结果通常会带来挑战。此外,利用EHR中的时间信息可以改善结果,预后,和诊断预测。然而,这些记录中数据的不规则频率需要数据预处理,这会增加时间序列分析的复杂性。
为了应对这些挑战,我们开发了一个开源的R包,便于从实验室记录中提取时间信息。所提出的实验室软件包通过将数据分段为时间序列窗口并估算缺失值来生成分析就绪时间序列数据。此外,用户可以将本地实验室代码映射到逻辑观测标识符名称和代码(LOINC),国际标准。此映射允许用户合并其他信息,如参考范围和相关疾病。此外,LOINC提供的参考范围使我们能够将结果分为正常或异常.最后,可以使用描述性统计进一步总结分析就绪的时间序列数据,并用于使用机器学习技术开发模型。
使用实验室软件包,我们分析了MIMIC-III的数据,关注动脉导管未闭(PDA)的新生儿。我们提取了时间序列实验室记录,并比较了有和没有30天住院死亡率的患者之间测试结果的差异。然后,我们在PDA诊断后7天确定了几个实验室测试结果的显着差异。利用时间序列分析就绪数据,我们用长短期记忆算法训练了一个预测模型,在模型训练中预测30天住院死亡率的接受者工作特征曲线下面积为0.83。这些发现证明了实验室软件包在分析疾病进展方面的有效性。
建议的实验室软件包简化并加快了实验室记录提取的工作流程。该工具在协助临床数据分析师克服与异质和稀疏实验室记录相关的障碍方面特别有价值。
公众号