Common data model

通用数据模型
  • 文章类型: Journal Article
    目的:覆盖整个人口的挪威卫生登记处用于管理,研究,和应急准备。我们将这些数据统一到观察医疗结果伙伴关系通用数据模型(OMOPCDM)上,并使用COVID-19相关数据丰富OMOP格式的现实数据。
    方法:来自六个注册管理机构(2018-2021年)的数据,涵盖出生登记,选定的初级和二级保健事件,疫苗接种,传染病通知被映射到OMOPCDMv5.3。使用数据表征文档和扫描工具对模拟数据开发了提取-转换-加载(ETL)管道。我们进行了仪表板质量检查,队列世代,调查源数据和映射数据之间的差异,并相应地完善了ETL。
    结果:我们绘制了5,673,845个人的15亿行数据。其中,有804,277次怀孕,483,585名母亲和792,477名儿童,和472,948个父亲。我们在380,794例患者中发现了382,516例COVID-19阳性检测。这些数字与源数据的结果一致。除了自动映射的1100万个源代码之外,我们将237个非标准代码映射到标准概念,并引入了38个自定义概念,以适应OMOPCDM词汇不支持的妊娠相关术语.共有3,700/3,705(99.8%)的检查通过。5个失败的检查可以通过数据的性质来解释,并且仅代表少量的记录。
    结论:挪威注册管理机构数据已成功地统一到OMOPCDM上,具有高度的一致性,并为联合COVID-19相关研究提供了有价值的来源。我们的绘图经验对于北欧卫生注册中心的数据合作伙伴非常有价值。
    OBJECTIVE: Norwegian health registries covering entire population are used for administration, research, and emergency preparedness. We harmonized these data onto the Observational Medical Outcomes Partnership common data model (OMOP CDM) and enrich real-world data in OMOP format with COVID-19 related data.
    METHODS: Data from six registries (2018-2021) covering birth registrations, selected primary and secondary care events, vaccinations, and communicable disease notifications were mapped onto the OMOP CDM v5.3. An Extract-Transform-Load (ETL) pipeline was developed on simulated data using data characterization documents and scanning tools. We ran dashboard quality checks, cohort generations, investigated differences between source and mapped data, and refined the ETL accordingly.
    RESULTS: We mapped 1.5 billion rows of data of 5,673,845 individuals. Among these, there were 804,277 pregnancies, 483,585 mothers together with 792,477 children, and 472,948 fathers. We identified 382,516 positive tests for COVID-19 in 380,794 patients. These figures are consistent with results from source data. In addition to 11 million source codes mapped automatically, we mapped 237 non-standard codes to standard concepts and introduced 38 custom concepts to accommodate pregnancy-related terminologies that were not supported by OMOP CDM vocabularies. A total of 3,700/3,705 (99.8%) checks passed. The 5 failed checks could be explained by the nature of the data and only represent a small number of records.
    CONCLUSIONS: Norwegian registry data were successfully harmonized onto OMOP CDM with high level of concordance and provides valuable source for federated COVID-19 related research. Our mapping experience is highly valuable for data partners with Nordic health registries.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:鉴于罕见疾病(RD)的地理稀疏性,组装队列通常是一项具有挑战性的任务。通用数据模型(CDM)可以协调不同的数据源,这些数据源可以成为决策支持系统和基于人工智能的研究的基础,带来了该领域的新见解。这项工作旨在支持针对罕见疾病的大规模多中心研究的设计。
    方法:在一个跨学科小组中,我们得出了三个医学领域(内分泌学,胃肠病学,和肺炎)根据专业知识和临床指南在迭代过程中。然后,我们定义了一个与所有数据元素匹配的RD数据结构,并构建了Extract,变换,加载(ETL)过程将结构转移到联合CDM。为了确保我们开发的CDM的互操作性及其后续用于更多的RD域,我们最终将其映射到观察医疗结果伙伴关系(OMOP)CDM。然后我们包含了第四个域,血液学,作为概念验证,并将急性髓细胞性白血病(AML)数据集映射到开发的CDM。
    结果:我们使用来自三个领域(内分泌学,胃肠病学,和肺炎),并使用血液学领域的数据测试了CDM。总研究队列包括61,697名患者。在将我们的模块与医学信息学计划(MII)核心数据集(CDS)模块对齐之后,我们利用了它的ETL流程。这促进了人口统计信息的无缝传输,诊断,程序,实验室结果,以及从我们的RD-CDM到OMOP的药物模块。对于表型和基因型,我们开发了第二个ETL流程。我们最终得出了为不同RD定制RD-CDM的经验教训。
    结论:这项工作可以作为其他领域的蓝图,因为其模块化结构可以扩展到新颖的数据类型。为了达成全面的CDM,需要一个跨学科的利益相关者团体来积极支持项目的进展。
    结论:与我们的RD-CDM相关的自定义数据结构可用于执行多中心研究,以更大规模地测试数据驱动的假设,并利用OHDSI社区提供的分析工具。
    BACKGROUND: Given the geographical sparsity of Rare Diseases (RDs), assembling a cohort is often a challenging task. Common data models (CDM) can harmonize disparate sources of data that can be the basis of decision support systems and artificial intelligence-based studies, leading to new insights in the field. This work is sought to support the design of large-scale multi-center studies for rare diseases.
    METHODS: In an interdisciplinary group, we derived a list of elements of RDs in three medical domains (endocrinology, gastroenterology, and pneumonology) according to specialist knowledge and clinical guidelines in an iterative process. We then defined a RDs data structure that matched all our data elements and built Extract, Transform, Load (ETL) processes to transfer the structure to a joint CDM. To ensure interoperability of our developed CDM and its subsequent usage for further RDs domains, we ultimately mapped it to Observational Medical Outcomes Partnership (OMOP) CDM. We then included a fourth domain, hematology, as a proof-of-concept and mapped an acute myeloid leukemia (AML) dataset to the developed CDM.
    RESULTS: We have developed an OMOP-based rare diseases common data model (RD-CDM) using data elements from the three domains (endocrinology, gastroenterology, and pneumonology) and tested the CDM using data from the hematology domain. The total study cohort included 61,697 patients. After aligning our modules with those of Medical Informatics Initiative (MII) Core Dataset (CDS) modules, we leveraged its ETL process. This facilitated the seamless transfer of demographic information, diagnoses, procedures, laboratory results, and medication modules from our RD-CDM to the OMOP. For the phenotypes and genotypes, we developed a second ETL process. We finally derived lessons learned for customizing our RD-CDM for different RDs.
    CONCLUSIONS: This work can serve as a blueprint for other domains as its modularized structure could be extended towards novel data types. An interdisciplinary group of stakeholders that are actively supporting the project\'s progress is necessary to reach a comprehensive CDM.
    CONCLUSIONS: The customized data structure related to our RD-CDM can be used to perform multi-center studies to test data-driven hypotheses on a larger scale and take advantage of the analytical tools offered by the OHDSI community.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    患者监控软件会生成大量数据,这些数据可以重新用于临床审核和科学研究。观察性健康数据科学与信息学(OHDSI)联盟开发了观察性医学成果伙伴关系(OMOP)通用数据模型(CDM),以标准化电子健康记录数据并促进大规模观察和纵向研究。
    本研究旨在将初级保健数据转换为OMOPCDM格式。
    我们从沃特雷洛斯多学科健康中心的电子健康记录中提取初级保健数据,法国。我们在本地初级保健数据库的设计与OMOPCDM表和字段之间进行了结构映射。当地法语词汇概念被映射到OHDSI标准词汇。为了验证将初级保健数据实施为OMOPCDM格式,我们应用了一组查询。通过仪表板的开发实现了实际应用。
    将18,395名患者的数据纳入OMOPCDM,相当于20年的592,226次磋商。总共执行了18个OMOP清洁发展机制表。共有17个地方词汇被确定为与初级保健相关,并与患者特征(性别,location,出生年份,和种族),计量单位,生物识别措施,实验室测试结果,病史,和药物处方。在语义映射期间,10,221个初级保健概念被映射到标准的OHDSI概念。通过将完成转换后获得的结果与在源软件中获得的结果进行比较,使用五个查询来验证OMOPCDM。最后,开发了一个原型仪表板来可视化健康中心的活动,实验室测试结果,和药物处方数据。
    来自法国医疗机构的初级保健数据已实施为OMOPCDM格式。有关人口统计的数据,units,测量,OHDSI词汇中已经提供了初级保健咨询步骤。将实验室测试结果和药物处方数据映射到可用的词汇表,并在最终模型中进行结构化。仪表板应用程序为医疗保健专业人员提供了有关其实践的反馈。
    UNASSIGNED: Patient-monitoring software generates a large amount of data that can be reused for clinical audits and scientific research. The Observational Health Data Sciences and Informatics (OHDSI) consortium developed the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to standardize electronic health record data and promote large-scale observational and longitudinal research.
    UNASSIGNED: This study aimed to transform primary care data into the OMOP CDM format.
    UNASSIGNED: We extracted primary care data from electronic health records at a multidisciplinary health center in Wattrelos, France. We performed structural mapping between the design of our local primary care database and the OMOP CDM tables and fields. Local French vocabularies concepts were mapped to OHDSI standard vocabularies. To validate the implementation of primary care data into the OMOP CDM format, we applied a set of queries. A practical application was achieved through the development of a dashboard.
    UNASSIGNED: Data from 18,395 patients were implemented into the OMOP CDM, corresponding to 592,226 consultations over a period of 20 years. A total of 18 OMOP CDM tables were implemented. A total of 17 local vocabularies were identified as being related to primary care and corresponded to patient characteristics (sex, location, year of birth, and race), units of measurement, biometric measures, laboratory test results, medical histories, and drug prescriptions. During semantic mapping, 10,221 primary care concepts were mapped to standard OHDSI concepts. Five queries were used to validate the OMOP CDM by comparing the results obtained after the completion of the transformations with the results obtained in the source software. Lastly, a prototype dashboard was developed to visualize the activity of the health center, the laboratory test results, and the drug prescription data.
    UNASSIGNED: Primary care data from a French health care facility have been implemented into the OMOP CDM format. Data concerning demographics, units, measurements, and primary care consultation steps were already available in OHDSI vocabularies. Laboratory test results and drug prescription data were mapped to available vocabularies and structured in the final model. A dashboard application provided health care professionals with feedback on their practice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    二肽基肽酶-4(DPP4)抑制剂经常用于2型糖尿病患者;然而,它们的成本可能对肾功能受损的患者构成重大障碍。这项研究旨在评估在肾功能受损和2型糖尿病患者中,用肾脏剂量调整(RDA)DPP4抑制剂代替非肾脏剂量调整(NRDA)DPP4抑制剂的经济效益。
    这项回顾性队列研究于2012年1月1日至2018年12月31日进行,使用从韩国五个医疗中心的通用数据模型获得的数据。模型1将肾功能保留的参与者的处方模式应用于肾功能受损的参与者。相比之下,模型2用RDADPP4抑制剂取代了所有NRDADPP4抑制剂,根据个体肾功能调整RDADPP4抑制剂的剂量。主要结果是两种模型之间的成本差异。
    总共,分析了67,964,996条处方记录。与肾功能保留的患者相比,NRDADPP4抑制剂的处方频率更高(25.7%,51.3%,64.3%,估计肾小球滤过率[eGFRs]≥60、<60、<45和<30mL/min/1.73m2的患者分别为71.6%)。当应用模型1时,eGFR<60mL/min/1.73m2每年可节省7.6%的成本,eGFR<30mL/min/1.73m2每年可节省30.4%的成本.根据模型2,根据肾脏损害的严重程度,每年可以节省15.4%至51.2%。
    根据个体肾功能调整RDADPP4抑制剂的剂量可以减轻与医疗费用相关的经济负担。
    UNASSIGNED: Dipeptidyl peptidase-4 (DPP4) inhibitors are frequently prescribed for patients with type 2 diabetes; however, their cost can pose a significant barrier for those with impaired kidney function. This study aimed to estimate the economic benefits of substituting non-renal dose-adjusted (NRDA) DPP4 inhibitors with renal dose-adjusted (RDA) DPP4 inhibitors in patients with both impaired kidney function and type 2 diabetes.
    UNASSIGNED: This retrospective cohort study was conducted from January 1, 2012 to December 31, 2018, using data obtained from common data models of five medical centers in Korea. Model 1 applied the prescription pattern of participants with preserved kidney function to those with impaired kidney function. In contrast, model 2 replaced all NRDA DPP4 inhibitors with RDA DPP4 inhibitors, adjusting the doses of RDA DPP4 inhibitors based on individual kidney function. The primary outcome was the cost difference between the two models.
    UNASSIGNED: In total, 67,964,996 prescription records were analyzed. NRDA DPP4 inhibitors were more frequently prescribed to patients with impaired kidney function than in those with preserved kidney function (25.7%, 51.3%, 64.3%, and 71.6% in patients with estimated glomerular filtration rates [eGFRs] of ≥60, <60, <45, and <30 mL/min/1.73 m2, respectively). When model 1 was applied, the cost savings per year were 7.6% for eGFR <60 mL/min/1.73 m2 and 30.4% for eGFR <30 mL/min/1.73 m2. According to model 2, 15.4% to 51.2% per year could be saved depending on kidney impairment severity.
    UNASSIGNED: Adjusting the doses of RDA DPP4 inhibitors based on individual kidney function could alleviate the economic burden associated with medical expenses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    通用数据模型提供了一种标准化的方式来表示联合学习任务中使用的数据。这篇综述的目的是探索通用数据模型的开发和使用,以协调健康研究中的电子健康记录数据。数据搜索产生了724条记录,其中19项纳入本研究。没有一项研究集中在护理特定主题上。所有研究都使用了观察性医疗结果伙伴关系(OMOP)通用数据模型,或者部分基于OMOP开发了一个模型。有必要制定路线图来指导研究,以开发用于联合学习的通用数据模型。
    Common data models provide a standardized way to represent data used in federated learning tasks. The aim of this review was to explore the development and use of common data models to harmonize electronic health record data in health research. The data search yielded 724 records, of which 19 were included for this study. None of the research focused on nursing specific topics. All studies either utilized the Observational Medical Outcomes Partnership (OMOP) common data model, or developed a model partly based on the OMOP. A roadmap to guide research for the development of common data models for federated learning are warranted.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目标:尽管CohortBuilder等工具易于使用,使用我们所有的研究计划数据复杂的研究问题需要相对较高的技术专长水平。我们旨在提高研究和培训能力,并通过R包减少进入我们所有人社区的障碍,Allofus.在这篇文章中,我们描述了解决我们在使用我们所有研究计划数据时遇到的常见挑战的功能,我们通过综合电子健康记录和具有时间依赖性的调查数据来创建“我们所有参与者”队列的示例演示此功能。
    背景:健康研究人员可以广泛获得我们所有的研究计划数据。allofusR软件包是针对广泛的研究人员,他们希望使用最佳实践进行复杂的分析,以实现可重复性和透明度,并且具有使用R的一系列经验。因为我们所有的数据都被转换为观察性医疗结果伙伴关系通用数据模型(OMOPCDM),熟悉现有OMOPCDM工具或希望与其他OMOPCDM数据一起进行网络研究的研究人员也将在该软件包中找到价值。
    方法:我们开发了一套初始功能,可以解决我们在自己的研究和指导学生项目中遇到的调查和电子健康记录数据问题。该方案将继续增长和发展与我们所有的研究计划。allofusR软件包可以通过增加对“我们所有研究计划”数据的访问来帮助建立社区研究能力,它的使用效率,以及由此产生的研究的严谨性和可重复性。
    OBJECTIVE: Despite easy-to-use tools like the Cohort Builder, using All of Us Research Program data for complex research questions requires a relatively high level of technical expertise. We aimed to increase research and training capacity and reduce barriers to entry for the All of Us community through an R package, allofus. In this article, we describe functions that address common challenges we encountered while working with All of Us Research Program data, and we demonstrate this functionality with an example of creating a cohort of All of Us participants by synthesizing electronic health record and survey data with time dependencies.
    BACKGROUND: All of Us Research Program data are widely available to health researchers. The allofus R package is aimed at a wide range of researchers who wish to conduct complex analyses using best practices for reproducibility and transparency, and who have a range of experience using R. Because the All of Us data are transformed into the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), researchers familiar with existing OMOP CDM tools or who wish to conduct network studies in conjunction with other OMOP CDM data will also find value in the package.
    METHODS: We developed an initial set of functions that solve problems we experienced across survey and electronic health record data in our own research and in mentoring student projects. The package will continue to grow and develop with the All of Us Research Program. The allofus R package can help build community research capacity by increasing access to the All of Us Research Program data, the efficiency of its use, and the rigor and reproducibility of the resulting research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    急性肾损伤(AKI)是临床恶化和肾毒性的标志。虽然有许多研究提供了早期检测AKI的预测模型,使用基于分布式研究网络(DRN)的时间序列数据预测AKI发生的研究很少见。
    在这项研究中,我们旨在通过将基于可解释长短期记忆(LSTM)的模型应用于使用DRN的肾毒性药物的患者的基于医院电子健康记录(EHR)的时间序列数据来检测AKI的早期发生.
    我们使用DRN对6家医院的数据进行了多机构回顾性队列研究。对于每个机构,使用5种用于AKI的药物构建了基于患者的数据集,并使用可解释的多变量LSTM(IMV-LSTM)模型进行训练。这项研究使用倾向评分匹配来减轻人口统计学和临床特征的差异。此外,证明了每个机构和药物的AKI预测模型贡献变量的时间注意力值,使用单向方差分析确认了病例和对照数据之间非常重要的特征分布差异。
    这项研究分析了8643例和31,012例有和没有AKI的患者,分别,6家医院在分析AKI发作的分布时,万古霉素显示起病较早(中位数12,IQR5-25天),与其他药物相比,阿昔洛韦最慢(中位数23,IQR10-41天)。我们用于AKI预测的时间深度学习模型对大多数药物表现良好。阿昔洛韦在每种药物的受试者工作特征曲线评分下的平均面积最高(0.94),其次是对乙酰氨基酚(0.93),万古霉素(0.92),萘普生(0.90),和塞来昔布(0.89)。根据AKI预测模型中变量的时间注意力值,已证实的淋巴细胞和钙万古霉素的关注度最高,而淋巴细胞,白蛋白,血红蛋白会随着时间的推移而减少,尿液pH值和凝血酶原时间有增加的趋势。
    可以通过基于EHR的DRN应用基于时间序列数据的IMV-LSTM来实现对AKI爆发的早期监测。这种方法可以帮助识别风险因素,并在AKI发生前开出引起肾毒性的药物时,早期发现药物不良反应。
    UNASSIGNED: Acute kidney injury (AKI) is a marker of clinical deterioration and renal toxicity. While there are many studies offering prediction models for the early detection of AKI, those predicting AKI occurrence using distributed research network (DRN)-based time series data are rare.
    UNASSIGNED: In this study, we aimed to detect the early occurrence of AKI by applying an interpretable long short-term memory (LSTM)-based model to hospital electronic health record (EHR)-based time series data in patients who took nephrotoxic drugs using a DRN.
    UNASSIGNED: We conducted a multi-institutional retrospective cohort study of data from 6 hospitals using a DRN. For each institution, a patient-based data set was constructed using 5 drugs for AKI, and an interpretable multivariable LSTM (IMV-LSTM) model was used for training. This study used propensity score matching to mitigate differences in demographics and clinical characteristics. Additionally, the temporal attention values of the AKI prediction model\'s contribution variables were demonstrated for each institution and drug, with differences in highly important feature distributions between the case and control data confirmed using 1-way ANOVA.
    UNASSIGNED: This study analyzed 8643 and 31,012 patients with and without AKI, respectively, across 6 hospitals. When analyzing the distribution of AKI onset, vancomycin showed an earlier onset (median 12, IQR 5-25 days), and acyclovir was the slowest compared to the other drugs (median 23, IQR 10-41 days). Our temporal deep learning model for AKI prediction performed well for most drugs. Acyclovir had the highest average area under the receiver operating characteristic curve score per drug (0.94), followed by acetaminophen (0.93), vancomycin (0.92), naproxen (0.90), and celecoxib (0.89). Based on the temporal attention values of the variables in the AKI prediction model, verified lymphocytes and calcvancomycin ium had the highest attention, whereas lymphocytes, albumin, and hemoglobin tended to decrease over time, and urine pH and prothrombin time tended to increase.
    UNASSIGNED: Early surveillance of AKI outbreaks can be achieved by applying an IMV-LSTM based on time series data through an EHR-based DRN. This approach can help identify risk factors and enable early detection of adverse drug reactions when prescribing drugs that cause renal toxicity before AKI occurs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    通用数据模型(CDM)在医疗保健研究中得到了广泛的应用,但其在癌症研究中的应用有限。本文介绍了癌症临床图书馆数据库(CCLD)的开发和实施策略,它们是韩国卫生和福利部在韩国临床数据利用卓越研究网络(K-CURE)项目下建立的标准化癌症特定数据库。韩国有15家领先的医院和14家学术协会正在为10种原发性癌症类型构建CCLD。对于每种癌症类型特异性CCLD,癌症数据专家确定癌症研究必不可少的关键临床数据项,跨癌症类型标准化这些项目,并创建一个标准化的模式。涵盖诊断的全面临床记录,治疗,和结果,随着年度更新,为目标人群中的每个癌症患者收集,质量控制基于六西格玛标准。为了保护患者隐私,CCLD遵循严格的数据安全准则,对个人身份信息进行假名化,并在封闭的分析环境中运行。研究人员可以通过K-CURE门户申请访问CCLD数据,这需要获得机构审查委员会和数据审查委员会的批准。CCLD被认为是一个开创性的标准化癌症特异性数据库,显著代表了韩国的癌症数据。有望克服以前CDM的局限性,并为韩国的多中心癌症研究提供宝贵的资源。
    The common data model (CDM) has found widespread application in healthcare studies, but its utilization in cancer research has been limited. This article describes the development and implementation strategy for Cancer Clinical Library Databases (CCLDs), which are standardized cancer-specific databases established under the Korea-Clinical Data Utilization Network for Research Excellence (K-CURE) project by the Korean Ministry of Health and Welfare. Fifteen leading hospitals and fourteen academic associations in Korea are engaged in constructing CCLDs for 10 primary cancer types. For each cancer type-specific CCLD, cancer data experts determine key clinical data items essential for cancer research, standardize these items across cancer types, and create a standardized schema. Comprehensive clinical records covering diagnosis, treatment, and outcomes, with annual updates, are collected for each cancer patient in the target population, and quality control is based on six-sigma standards. To protect patient privacy, CCLDs follow stringent data security guidelines by pseudonymizing personal identification information and operating within a closed analysis environment. Researchers can apply for access to CCLD data through the K-CURE portal, which is subject to Institutional Review Board and Data Review Board approval. The CCLD is considered a pioneering standardized cancer-specific database, significantly representing Korea\'s cancer data. It is expected to overcome limitations of previous CDMs and provide a valuable resource for multicenter cancer research in Korea.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:数字化转型,特别是医学成像与临床数据的整合,在个性化医疗中至关重要。观察性医疗结果伙伴关系(OMOP)通用数据模型(CDM)标准化了健康数据。然而,整合医学成像仍然是一个挑战。
    目的:本研究提出了一种将医学成像数据与OMOPCDM相结合的方法,以改善多模态研究。
    方法:我们的方法包括分析和选择医学标题标签中的数字成像和通信,数据格式的验证,并根据OMOP清洁发展机制框架进行调整。快速医疗保健互操作性资源ImagingStudy简介指导了我们在列命名和定义方面的一致性。成像通用数据模型(I-CDM)使用实体-属性-值模型构建,促进可扩展和高效的医学成像数据管理。对于2010年至2017年间诊断为肺癌的患者,我们引入了4个新的表格-IMAGING_STEST,IMAGING_SERIES,IMAGING_ANNOTATION,和FILEPATH-标准化各种影像学相关数据并链接到临床数据。
    结果:该框架强调了I-CDM在增强我们对肺癌诊断和治疗策略的理解方面的有效性。I-CDM表的实施使全面的数据集能够结构化组织,包括282,098个图像研究,5,674,425图像系列,和48,536个图像注释记录,说明了该方法的广泛范围和深度。使用肺癌患者的实际数据进行的基于情景的分析强调了我们方法的可行性。应用44条特定规则的数据质量检查确认了构建的数据集的高度完整性,所有检查都成功通过,强调了我们研究结果的可靠性。
    结论:这些发现表明I-CDM可以改善医学影像和临床数据的整合和分析。通过解决数据标准化和管理方面的挑战,我们的方法有助于加强诊断和治疗策略.未来的研究应该将I-CDM的应用扩展到不同的疾病人群,并探索其在医疗条件下的广泛用途。
    BACKGROUND: Digital transformation, particularly the integration of medical imaging with clinical data, is vital in personalized medicine. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standardizes health data. However, integrating medical imaging remains a challenge.
    OBJECTIVE: This study proposes a method for combining medical imaging data with the OMOP CDM to improve multimodal research.
    METHODS: Our approach included the analysis and selection of digital imaging and communications in medicine header tags, validation of data formats, and alignment according to the OMOP CDM framework. The Fast Healthcare Interoperability Resources ImagingStudy profile guided our consistency in column naming and definitions. Imaging Common Data Model (I-CDM), constructed using the entity-attribute-value model, facilitates scalable and efficient medical imaging data management. For patients with lung cancer diagnosed between 2010 and 2017, we introduced 4 new tables-IMAGING_STUDY, IMAGING_SERIES, IMAGING_ANNOTATION, and FILEPATH-to standardize various imaging-related data and link to clinical data.
    RESULTS: This framework underscores the effectiveness of I-CDM in enhancing our understanding of lung cancer diagnostics and treatment strategies. The implementation of the I-CDM tables enabled the structured organization of a comprehensive data set, including 282,098 IMAGING_STUDY, 5,674,425 IMAGING_SERIES, and 48,536 IMAGING_ANNOTATION records, illustrating the extensive scope and depth of the approach. A scenario-based analysis using actual data from patients with lung cancer underscored the feasibility of our approach. A data quality check applying 44 specific rules confirmed the high integrity of the constructed data set, with all checks successfully passed, underscoring the reliability of our findings.
    CONCLUSIONS: These findings indicate that I-CDM can improve the integration and analysis of medical imaging and clinical data. By addressing the challenges in data standardization and management, our approach contributes toward enhancing diagnostics and treatment strategies. Future research should expand the application of I-CDM to diverse disease populations and explore its wide-ranging utility for medical conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:慢性疾病管理是全球范围内的主要健康问题。随着向预防医学的范式转变,使用机器学习的疾病预测建模对于精确和准确的医学判断越来越重要。
    目的:本研究旨在使用通用数据模型(CDM)和机器学习开发4种慢性病的高性能预测模型,并确认所提出模型扩展的可能性。
    方法:在本研究中,4种主要的慢性病,即糖尿病,高血压,高脂血症,和心血管疾病-被选中,并建立了预测其在10年内发生的模型。对于模型开发,Atlas分析工具用于定义要预测的慢性病,并根据定义的条件从CDM中提取数据。用先前研究验证的4种算法建立了预测每种疾病的模型,并在应用网格搜索后比较了性能。
    结果:对于每种疾病的预测,我们应用了4种算法(逻辑回归,梯度增强,随机森林,和极端梯度提升),所有模型的准确率都超过80%。与优化模型的性能相比,极端梯度提升对这4种疾病(糖尿病,高血压,高脂血症,和心血管疾病),曲线标准下面积为80%或更高,且为0.84至0.93。
    结论:本研究通过使用CDM和机器学习预测慢性病的发生,证明了对慢性病进行抢先管理的可能性。有了这些模型,通过使用我们的基于现实世界数据的CDM和个人可以轻松获得的国家健康保险公司检查数据开发的慢性病预测机器学习模型,可以通过识别健康风险因素来证明在10年内发展为重大慢性病的风险。
    BACKGROUND: Chronic disease management is a major health issue worldwide. With the paradigm shift to preventive medicine, disease prediction modeling using machine learning is gaining importance for precise and accurate medical judgement.
    OBJECTIVE: This study aimed to develop high-performance prediction models for 4 chronic diseases using the common data model (CDM) and machine learning and to confirm the possibility for the extension of the proposed models.
    METHODS: In this study, 4 major chronic diseases-namely, diabetes, hypertension, hyperlipidemia, and cardiovascular disease-were selected, and a model for predicting their occurrence within 10 years was developed. For model development, the Atlas analysis tool was used to define the chronic disease to be predicted, and data were extracted from the CDM according to the defined conditions. A model for predicting each disease was built with 4 algorithms verified in previous studies, and the performance was compared after applying a grid search.
    RESULTS: For the prediction of each disease, we applied 4 algorithms (logistic regression, gradient boosting, random forest, and extreme gradient boosting), and all models show greater than 80% accuracy. As compared to the optimized model\'s performance, extreme gradient boosting presented the highest predictive performance for the 4 diseases (diabetes, hypertension, hyperlipidemia, and cardiovascular disease) with 80% or greater and from 0.84 to 0.93 in area under the curve standards.
    CONCLUSIONS: This study demonstrates the possibility for the preemptive management of chronic diseases by predicting the occurrence of chronic diseases using the CDM and machine learning. With these models, the risk of developing major chronic diseases within 10 years can be demonstrated by identifying health risk factors using our chronic disease prediction machine learning model developed with the real-world data-based CDM and National Health Insurance Corporation examination data that individuals can easily obtain.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号