关键词: DDI Data Documentation Initiative FAIR data principles JSON-LD JavaScript Object Notation for Linked Data OMOP CDM Observational Medical Outcomes Partnership Common Data Model data models data science machine-readable metadata metadata standardization

来  源:   DOI:10.2196/56237   PDF(Pubmed)

Abstract:
Metadata describe and provide context for other data, playing a pivotal role in enabling findability, accessibility, interoperability, and reusability (FAIR) data principles. By providing comprehensive and machine-readable descriptions of digital resources, metadata empower both machines and human users to seamlessly discover, access, integrate, and reuse data or content across diverse platforms and applications. However, the limited accessibility and machine-interpretability of existing metadata for population health data hinder effective data discovery and reuse.
To address these challenges, we propose a comprehensive framework using standardized formats, vocabularies, and protocols to render population health data machine-readable, significantly enhancing their FAIRness and enabling seamless discovery, access, and integration across diverse platforms and research applications.
The framework implements a 3-stage approach. The first stage is Data Documentation Initiative (DDI) integration, which involves leveraging the DDI Codebook metadata and documentation of detailed information for data and associated assets, while ensuring transparency and comprehensiveness. The second stage is Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standardization. In this stage, the data are harmonized and standardized into the OMOP CDM, facilitating unified analysis across heterogeneous data sets. The third stage involves the integration of Schema.org and JavaScript Object Notation for Linked Data (JSON-LD), in which machine-readable metadata are generated using Schema.org entities and embedded within the data using JSON-LD, boosting discoverability and comprehension for both machines and human users. We demonstrated the implementation of these 3 stages using the Integrated Disease Surveillance and Response (IDSR) data from Malawi and Kenya.
The implementation of our framework significantly enhanced the FAIRness of population health data, resulting in improved discoverability through seamless integration with platforms such as Google Dataset Search. The adoption of standardized formats and protocols streamlined data accessibility and integration across various research environments, fostering collaboration and knowledge sharing. Additionally, the use of machine-interpretable metadata empowered researchers to efficiently reuse data for targeted analyses and insights, thereby maximizing the overall value of population health resources. The JSON-LD codes are accessible via a GitHub repository and the HTML code integrated with JSON-LD is available on the Implementation Network for Sharing Population Information from Research Entities website.
The adoption of machine-readable metadata standards is essential for ensuring the FAIRness of population health data. By embracing these standards, organizations can enhance diverse resource visibility, accessibility, and utility, leading to a broader impact, particularly in low- and middle-income countries. Machine-readable metadata can accelerate research, improve health care decision-making, and ultimately promote better health outcomes for populations worldwide.
摘要:
背景:元数据描述并提供其他数据的上下文,在实现可查找性方面发挥着关键作用,可访问性,互操作性,和可重用性(FAIR)数据原则。通过提供全面且机器可读的数字资源描述,元数据使机器和人类用户能够无缝地发现,access,集成,并在不同的平台和应用程序中重用数据或内容。然而,人口健康数据的现有元数据的有限可访问性和机器可解释性阻碍了有效的数据发现和重用。
目标:为了应对这些挑战,我们提出了一个使用标准化格式的综合框架,词汇,以及使人口健康数据机器可读的协议,显着提高他们的公平度并实现无缝发现,access,以及跨不同平台和研究应用的集成。
方法:该框架实现了3阶段方法。第一阶段是数据文档计划(DDI)集成,这涉及利用DDI代码簿元数据以及数据和相关资产的详细信息文档,同时确保透明度和全面性。第二阶段是观察性医疗结果伙伴关系(OMOP)通用数据模型(CDM)标准化。在这个阶段,数据在OMOPCDM中得到协调和标准化,促进跨异构数据集的统一分析。第三阶段涉及Schema.org和JavaScript对象表示法(JSON-LD)的集成,其中使用Schema.org实体生成机器可读元数据,并使用JSON-LD嵌入数据中,提高机器和人类用户的可发现性和理解力。我们使用马拉维和肯尼亚的综合疾病监测和反应(IDSR)数据展示了这三个阶段的实施情况。
结果:我们框架的实施显着提高了人口健康数据的公平性,通过与GoogleDatasetSearch等平台的无缝集成,提高了可发现性。采用标准化格式和协议简化了各种研究环境中的数据可访问性和集成,促进协作和知识共享。此外,使用机器可解释的元数据使研究人员能够有效地重用数据进行有针对性的分析和见解,从而最大限度地提高人口卫生资源的整体价值。JSON-LD代码可通过GitHub存储库访问,与JSON-LD集成的HTML代码可在实施网络上从研究实体网站共享人口信息。
结论:采用机器可读的元数据标准对于确保人口健康数据的公平性至关重要。通过接受这些标准,组织可以增强不同资源的可见性,可访问性,和效用,带来更广泛的影响,特别是在低收入和中等收入国家。机器可读的元数据可以加速研究,改善医疗保健决策,并最终促进全球人口更好的健康结果。
公众号