Data Harmonization

数据协调
  • 文章类型: Journal Article
    结合多项研究的相关数据可以提高流行病学调查的稳健性。有效的“统计前”数据协调对于简化集体行为至关重要,多研究分析。协调数据和记录有关变量转换为一组共同的分类值和测量尺度的决定是耗时的,并且容易出错。特别是对于大量变量的研究。psHarmonizeR软件包通过组合多个数据集来促进协调,应用数据转换函数,并创建长而广泛的统一数据集。用户在包含数据集名称、变量名,和编码指令,并集中跟踪所有决策。一揽子计划进行协调,根据需要生成错误日志,并创建统一数据的汇总报告。psHarmonize有望成为多项研究联合分析的数据准备的中心特征。
    Combining pertinent data from multiple studies can increase the robustness of epidemiological investigations. Effective \"pre-statistical\" data harmonization is paramount to the streamlined conduct of collective, multi-study analysis. Harmonizing data and documenting decisions about the transformations of variables to a common set of categorical values and measurement scales are time consuming and can be error prone, particularly for numerous studies with large quantities of variables. The psHarmonize R package facilitates harmonization by combining multiple datasets, applying data transformation functions, and creating long and wide harmonized datasets. The user provides transformation instructions in a \"harmonization sheet\" that includes dataset names, variable names, and coding instructions and centrally tracks all decisions. The package performs harmonization, generates error logs as necessary, and creates summary reports of harmonized data. psHarmonize is poised to serve as a central feature of data preparation for the joint analysis of multiple studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    韩国国立卫生研究院启动了跨队列的数据协调,旨在确保数据的语义互操作性,并为未来的合作研究创建标准化数据元素的通用数据库。为了这个目标,我们审查了队列的代码簿,并确定了可以合并用于数据分析的常见数据项和值.然后,我们将数据项和值映射到标准健康术语,例如SNOMEDCT。将介绍正在进行的数据协调工作的初步结果。
    Korean National Institute of Health initiated data harmonization across cohorts with the aim to ensure semantic interoperability of data and to create a common database of standardized data elements for future collaborative research. With this aim, we reviewed code books of cohorts and identified common data items and values which can be combined for data analyses. We then mapped data items and values to standard health terminologies such as SNOMED CT. Preliminary results of this ongoing data harmonization work will be presented.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    医疗保健数据在各种系统之间的互操作性仍然是一个巨大的挑战,很大程度上归因于使用中的不同数据架构和API。本研究展示了将FHIR层集成到GameBus中,一个游戏化的健康平台,旨在增强其互操作性。传统上,GameBus依赖于专有的数据模式和RESTAPI,这限制了与其他平台的数据交换。FHIR标准的并入显著地减轻了这些约束。FHIR层,使用开源技术(包括用于数据转换的GoogleHCLS数据协调工具和用于RESTful服务的HAPIFHIR框架)构建,允许GameBus使用标准化的FHIR格式和API进行数据共享。作为独立的微服务实现,该层不需要对GameBus的预先存在的体系结构进行任何更改。此外,FHIR层的设计和实现说明了实现跨不同医疗保健平台的互操作性的通用方法。
    The interoperability of healthcare data across various systems remains a big challenge, largely attributable to the disparate data schemas and APIs in use. This study showcases the integration of a FHIR layer into GameBus, a gamified health platform, aiming to enhance its interoperability. Traditionally, GameBus has relied on proprietary data schemas and REST APIs, which restricted data exchange with other platforms. The incorporation of the FHIR standard significantly mitigates these constraints. The FHIR layer, constructed with open-source technologies - including the Google HCLS Data Harmonization tool for data transformation and the HAPI FHIR framework for RESTful services - allows GameBus to engage in data sharing using standardized FHIR formats and APIs. Implemented as a standalone microservice, this layer requires no alterations to the pre-existing architecture of GameBus. Furthermore, the design and implementation of the FHIR layer illustrate a generic method for achieving interoperability across diverse healthcare platforms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    MyDigiTwin是一项科学计划,旨在开发早期发现和预防心血管疾病的平台。这个平台,它由以联合方式训练的预测模型支持,以保护数据隐私,预计将由荷兰个人健康环境(PGO)主办。因此,这种联合学习架构的挑战之一是确保PGO数据和将成为其中一部分的参考数据集之间的一致性。本文介绍了一种新颖的数据协调框架,该框架简化了基于FHIR的多个队列研究数据表示的有效生成。此外,讨论了其在将生命线队列研究数据整合到MiDigiTwin联合研究基础设施中的适用性。
    MyDigiTwin is a scientific initiative for the development of a platform for the early detection and prevention of cardiovascular diseases. This platform, which is supported by prediction models trained in a federated fashion to preserve data privacy, is expected to be hosted by the Dutch Personal Health Environments (PGOs). Consequently, one of the challenges for this federated learning architecture is ensuring consistency between the PGOs data and the reference datasets that will be part of it. This paper introduces a novel data harmonization framework that streamlines an efficient generation of FHIR-based representations of multiple cohort study data. Furthermore, its applicability in the integration of Lifelines\' cohort study data into the MiDigiTwin federated research infrastructure is discussed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    生物银行是涉及多学科团队和越来越多利益相关者的研究必不可少的基础设施。在个性化医疗领域,生物银行通过提供特征明确和注释的样品,同时保护供体的权利,发挥了关键作用。安达卢西亚公共卫生系统生物库(SSPA生物库)实施了一个由不同模块组成的全球信息管理系统,允许记录,与生物库操作相关的所有信息的可追溯性和监控。数据模型,根据国际数据协调倡议以标准化和规范化的方式设计,整合保证研究成果质量所需的信息,有利于研究人员,临床医生和捐赠者。
    Biobanks are infrastructures essential for research involving multi-disciplinary teams and an increasing number of stakeholders. In the field of personalized medicine, biobanks play a key role through the provision of well-characterized and annotated samples protecting at the same time the right of donors. The Andalusian Public Health System Biobank (SSPA Biobank) has implemented a global information management system made up of different modules that allow for the recording, traceability and monitoring of all the information associated with the biobank operations. The data model, designed in a standardized and normalized way according to international initiatives on data harmonization, integrates the information necessary to guarantee the quality of results from research, benefiting researchers, clinicians and donors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目标:本文旨在描述一种名为HealthConnect的新健康信息技术系统的实施,该系统正在协调加拿大纽芬兰和拉布拉多省的癌症数据;解释该技术的高级技术细节;提供该技术如何帮助改善该省的癌症护理的具体示例,并讨论其未来的扩展和影响。方法:我们给出了健康连接架构的技术描述,它如何将众多数据源集成到一个单一的,可扩展的癌症数据健康信息系统,并突出其人工智能和分析能力。结果:我们说明了HealthConnect的两项实际成就。首先,一个分析仪表板,用于查明该省小的定义地理区域的结肠癌筛查吸收的变化;第二,一种自然语言处理算法,该算法根据对乳腺X线照相术报告的评估,在解释适当的后续行动时提供AI辅助决策支持.结论:健康连接是一个前沿,用于协调癌症筛查数据以进行实际决策的卫生系统解决方案。长期目标是将所有癌症护理数据纳入HealthConnect,为该省的癌症护理建立一个全面的健康信息系统。
    Objective: This article aims to describe the implementation of a new health information technology system called Health Connect that is harmonizing cancer data in the Canadian province of Newfoundland and Labrador; explain high-level technical details of this technology; provide concrete examples of how this technology is helping to improve cancer care in the province, and to discuss its future expansion and implications. Methods: We give a technical description of the Health Connect architecture, how it integrated numerous data sources into a single, scalable health information system for cancer data and highlight its artificial intelligence and analytics capacity. Results: We illustrated two practical achievements of Health Connect. First, an analytical dashboard that was used to pinpoint variations in colon cancer screening uptake in small defined geographic regions of the province; and second, a natural language processing algorithm that provided AI-assisted decision support in interpreting appropriate follow-up action based on assessments of breast mammography reports. Conclusion: Health Connect is a cutting-edge, health systems solution for harmonizing cancer screening data for practical decision-making. The long term goal is to integrate all cancer care data holdings into Health Connect to build a comprehensive health information system for cancer care in the province.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    应用研究人员和实践者经常面临的问题是,研究联盟中的不同机构使用不同的尺度来评估相同的结构,这使得结果的比较和汇集具有挑战性。为了有意义地汇集和比较分数,尺度应该统一。本文的目的是使用不同的测试等同方法来协调儿童行为清单(CBCL)和优势和困难问卷(SDQ)中的ADHD得分,并查看哪种方法导致结果。
    样本包括1551份10-11.5岁儿童的家长报告,这些报告来自Raine对CBCL和SDQ(普通人设计)的研究。我们用线性等号,内核等同,项目反应理论(IRT),以及以下机器学习方法:回归(线性和序数),随机森林(回归和分类)和支持向量机(回归和分类)。根据交叉验证中预测的和观察到的分数之间的差异的均方根误差(RMSE)来操作所述方法的功效。
    结果表明,采用单组设计,最好使用使用项目级别信息并将结果视为间隔测量级别(回归方法)的方法。
    UNASSIGNED: A problem that applied researchers and practitioners often face is the fact that different institutions within research consortia use different scales to evaluate the same construct which makes comparison of the results and pooling challenging. In order to meaningfully pool and compare the scores, the scales should be harmonized. The aim of this paper is to use different test equating methods to harmonize the ADHD scores from Child Behavior Checklist (CBCL) and Strengths and Difficulties Questionnaire (SDQ) and to see which method leads to the result.
    UNASSIGNED: Sample consists of 1551 parent reports of children aged 10-11.5 years from Raine study on both CBCL and SDQ (common persons design). We used linear equating, kernel equating, Item Response Theory (IRT), and the following machine learning methods: regression (linear and ordinal), random forest (regression and classification) and Support Vector Machine (regression and classification). Efficacy of the methods is operationalized in terms of the root-mean-square error (RMSE) of differences between predicted and observed scores in cross-validation.
    UNASSIGNED: Results showed that with single group design, it is the best to use the methods that use item level information and that treat the outcome as interval measurement level (regression approach).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:医疗保健提供者之间的跨机构互操作性仍然是全球范围内反复出现的挑战。德国医学信息学倡议,德国37所大学医院的合作,旨在通过定义用于跨机构交换医疗保健数据的快速医疗保健互操作性资源(FHIR)配置文件来实现合作伙伴站点之间的互操作性,核心数据集(CDS)。当前的CDS及其扩展模块定义了代表患者健康护理记录的元素。德国所有大学医院在提供基于CDS的标准化格式的常规数据方面都取得了重大进展。此外,健康中心研究平台,德国医学研究数据可行性工具门户网站,允许医学研究人员查询许多参与医院的可用CDS数据项。
    目的:在本研究中,我们旨在评估一种新颖的方法,该方法将当前自上而下生成的FHIR配置文件与通过分析各个实例数据获得的自下而上生成的知识相结合。这使我们能够使用从差异分析中获得的信息得出用于迭代细化FHIR配置文件的选项。
    方法:我们开发了FHIR验证管道,并选择从原始CDS配置文件中获得更多限制性配置文件。此决定是由于需要更紧密地与中央可行性平台的搜索本体的特定假设和要求保持一致。虽然最初的CDS配置文件提供了一个通用框架,适用于广泛的医学信息学用例,它们缺乏特异性来模拟医学研究人员必不可少的细微差别标准。这方面的一个关键例子是需要准确地表示特定的实验室编码和值的相互依赖性。验证结果使我们能够识别临床站点的实例数据与可行性平台指定并在将来解决的配置文件之间的差异。
    结果:共有20所大学医院参与了这项研究。历史因素,缺乏协调,广泛的源系统,编码的大小写敏感性是识别出差异的一些原因。而在我们的案例研究中,条件,Procedures,和医药由于德国对计费的立法要求,在实例数据的编码上具有较高的统一性,我们发现,由于编码和值之间的相互依赖性,实验室值对数据协调构成重大挑战.
    结论:虽然CDS实现了互操作性,联邦数据访问面临不同的挑战,需要在配置文件中更多的特异性来对实例数据进行假设。我们进一步认为,进一步协调实例数据可以显着降低所需的回顾性协调工作。我们认识到,差异不能仅在临床现场解决;因此,我们的发现具有广泛的影响,需要在多个层面上和不同利益相关者采取行动。
    BACKGROUND: Cross-institutional interoperability between health care providers remains a recurring challenge worldwide. The German Medical Informatics Initiative, a collaboration of 37 university hospitals in Germany, aims to enable interoperability between partner sites by defining Fast Healthcare Interoperability Resources (FHIR) profiles for the cross-institutional exchange of health care data, the Core Data Set (CDS). The current CDS and its extension modules define elements representing patients\' health care records. All university hospitals in Germany have made significant progress in providing routine data in a standardized format based on the CDS. In addition, the central research platform for health, the German Portal for Medical Research Data feasibility tool, allows medical researchers to query the available CDS data items across many participating hospitals.
    OBJECTIVE: In this study, we aimed to evaluate a novel approach of combining the current top-down generated FHIR profiles with the bottom-up generated knowledge gained by the analysis of respective instance data. This allowed us to derive options for iteratively refining FHIR profiles using the information obtained from a discrepancy analysis.
    METHODS: We developed an FHIR validation pipeline and opted to derive more restrictive profiles from the original CDS profiles. This decision was driven by the need to align more closely with the specific assumptions and requirements of the central feasibility platform\'s search ontology. While the original CDS profiles offer a generic framework adaptable for a broad spectrum of medical informatics use cases, they lack the specificity to model the nuanced criteria essential for medical researchers. A key example of this is the necessity to represent specific laboratory codings and values interdependencies accurately. The validation results allow us to identify discrepancies between the instance data at the clinical sites and the profiles specified by the feasibility platform and addressed in the future.
    RESULTS: A total of 20 university hospitals participated in this study. Historical factors, lack of harmonization, a wide range of source systems, and case sensitivity of coding are some of the causes for the discrepancies identified. While in our case study, Conditions, Procedures, and Medications have a high degree of uniformity in the coding of instance data due to legislative requirements for billing in Germany, we found that laboratory values pose a significant data harmonization challenge due to their interdependency between coding and value.
    CONCLUSIONS: While the CDS achieves interoperability, different challenges for federated data access arise, requiring more specificity in the profiles to make assumptions on the instance data. We further argue that further harmonization of the instance data can significantly lower required retrospective harmonization efforts. We recognize that discrepancies cannot be resolved solely at the clinical site; therefore, our findings have a wide range of implications and will require action on multiple levels and by various stakeholders.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:由于结构定义和测量方法随着时间的推移存在差异,因此有效使用纵向研究数据具有挑战性。在研究和跨学科之间。克服这些挑战的一种方法是数据协调。数据协调是一种用于提高变量可比性和减少跨研究异质性的做法。这项研究描述了用于评估每个调查波中口腔健康相关变量的协调潜力的过程。
    方法:选择在过去二十年中进行的具有相似主题/目标的全国儿童队列调查。遵循Maelstrom研究指南进行协调潜力评估。
    结果:纳入了7项具有全国代表性的儿童队列调查,并对50个调查波进行了问卷调查。问卷分为三个领域和十五个结构,并按年龄组进行总结。编制了包含42个变量的DataSchema(代表口腔健康结果和风险因素的合适版本的核心变量列表)。对于每一个研究浪潮,计算了生成每个DataSchema变量的可能性(或不生成)。在2100项统一状况评估中,543(26%)完成。大约50%的DataSchema变量可以在至少四个队列调查中生成,而只有10%(n=4)的变量可以在所有调查中生成。对于每个调查,可以生成的DataSchema变量的范围在26%到76%之间。
    结论:数据协调可以提高调查内部和调查之间变量的可比性。对于未来的队列调查,作者主张在调查中和调查之间的调查问卷更加一致和标准化。
    OBJECTIVE: Effective use of longitudinal study data is challenging because of divergences in the construct definitions and measurement approaches over time, between studies and across disciplines. One approach to overcome these challenges is data harmonization. Data harmonization is a practice used to improve variable comparability and reduce heterogeneity across studies. This study describes the process used to evaluate the harmonization potential of oral health-related variables across each survey wave.
    METHODS: National child cohort surveys with similar themes/objectives conducted in the last two decades were selected. The Maelstrom Research Guidelines were followed for harmonization potential evaluation.
    RESULTS: Seven nationally representative child cohort surveys were included and questionnaires examined from 50 survey waves. Questionnaires were classified into three domains and fifteen constructs and summarized by age groups. A DataSchema (a list of core variables representing the suitable version of the oral health outcomes and risk factors) was compiled comprising 42 variables. For each study wave, the potential (or not) to generate each DataSchema variable was evaluated. Of the 2100 harmonization status assessments, 543 (26%) were complete. Approximately 50% of the DataSchema variables can be generated across at least four cohort surveys while only 10% (n = 4) variables can be generated across all surveys. For each survey, the DataSchema variables that can be generated ranged between 26% and 76%.
    CONCLUSIONS: Data harmonization can improve the comparability of variables both within and across surveys. For future cohort surveys, the authors advocate more consistency and standardization in survey questionnaires within and between surveys.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    尽管氯消毒病毒至关重要,我们对不同病毒对氯的相对敏感性以及病毒消毒速率常数与环境参数之间牢固的定量关系的基本理解仍然有限。我们对游离氯的病毒灭活进行了系统的审查,并使用所得数据集开发了线性混合模型,该模型根据实验条件估算了病毒的氯灭活速率常数。在我们的系统评价中收集了570个数据点,在广泛的环境条件下代表82种病毒。在参考条件(pH=7.53,T=20°C,[Cl-]<50mM)跨越4个数量级,范围从0.0196到1150Lmg-1min-1,并揭示了病毒之间的重要趋势。而普通的替代噬菌体MS2并不能作为许多人类病毒的保守氯消毒替代品,CVB5是数据集中最具抗性的病毒之一。该模型量化了pH的作用,温度,和氯化物在病毒中的水平,和在线工具允许用户估计病毒和感兴趣的条件的速率常数。该模型的结果确定了当前美国EPA饮用水消毒要求中的潜在缺点。
    Despite the critical importance of virus disinfection by chlorine, our fundamental understanding of the relative susceptibility of different viruses to chlorine and robust quantitative relationships between virus disinfection rate constants and environmental parameters remains limited. We conducted a systematic review of virus inactivation by free chlorine and used the resulting data set to develop a linear mixed model that estimates chlorine inactivation rate constants for viruses based on experimental conditions. 570 data points were collected in our systematic review, representing 82 viruses over a broad range of environmental conditions. The harmonized inactivation rate constants under reference conditions (pH = 7.53, T = 20 °C, [Cl-] < 50 mM) spanned 5 orders of magnitude, ranging from 0.0196 to 1150 L mg-1 min-1, and uncovered important trends between viruses. Whereas common surrogate bacteriophage MS2 does not serve as a conservative chlorine disinfection surrogate for many human viruses, CVB5 was one of the most resistant viruses in the data set. The model quantifies the role of pH, temperature, and chloride levels across viruses, and an online tool allows users to estimate rate constants for viruses and conditions of interest. Results from the model identified potential shortcomings in current U.S. EPA drinking water disinfection requirements.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号