data science

数据科学
  • 文章类型: Journal Article
    背景:学习和教学跨学科健康数据科学(HDS)极具挑战性,尽管人们对HDS教育的兴趣与日俱增,对HDS学生的学习经验和偏好知之甚少。
    目的:我们进行了系统评价,以确定HDS学科的学习偏好和策略。
    方法:我们搜索了10个书目数据库(PubMed,ACM数字图书馆,WebofScience,科克伦图书馆,Wiley在线图书馆,ScienceDirect,SpringerLink,EBSCOhost,ERIC,和IEEEXplore)自成立之日起至2023年6月。我们遵循PRISMA(系统评论和荟萃分析的首选报告项目)指南,并包括以英语编写的主要研究,调查HDS相关学科学生的学习偏好或策略。比如生物信息学,在任何学术水平。偏倚风险由2名筛查人员使用混合方法评估工具进行独立评估,我们使用叙事数据合成来呈现研究结果。
    结果:在对从数据库中检索到的849篇论文进行摘要筛选和全文审阅之后,8项(0.9%)研究,2009年至2021年出版,被选作叙事综合。这些论文中的大多数(7/8,88%)调查了学习偏好,而只有1篇(12%)论文研究了HDS课程的学习策略。系统综述显示,大多数HDS学习者更喜欢视觉演示作为主要的学习输入。在学习过程和组织方面,他们大多倾向于遵循逻辑,线性,和顺序步骤。此外,他们更关注抽象的信息,而不是详细和具体的信息。关于合作,HDS学生有时更喜欢团队合作,有时他们更喜欢独自工作。
    结论:研究质量,使用混合方法评估工具进行评估,介于73%到100%之间,表明整体质量优良。然而,这方面的研究数量很少,所有研究的结果都是基于自我报告的数据。因此,需要进行更多的研究来深入了解HDS教育。我们提供了一些建议,例如使用学习分析和教育数据挖掘方法,进行未来的研究,以解决文献中的差距。我们还讨论了对HDS教育工作者的影响,我们为HDS课程设计提出建议;例如,我们建议包括视觉材料,例如图表和视频,并为学生提供分步指导。
    BACKGROUND: Learning and teaching interdisciplinary health data science (HDS) is highly challenging, and despite the growing interest in HDS education, little is known about the learning experiences and preferences of HDS students.
    OBJECTIVE: We conducted a systematic review to identify learning preferences and strategies in the HDS discipline.
    METHODS: We searched 10 bibliographic databases (PubMed, ACM Digital Library, Web of Science, Cochrane Library, Wiley Online Library, ScienceDirect, SpringerLink, EBSCOhost, ERIC, and IEEE Xplore) from the date of inception until June 2023. We followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines and included primary studies written in English that investigated the learning preferences or strategies of students in HDS-related disciplines, such as bioinformatics, at any academic level. Risk of bias was independently assessed by 2 screeners using the Mixed Methods Appraisal Tool, and we used narrative data synthesis to present the study results.
    RESULTS: After abstract screening and full-text reviewing of the 849 papers retrieved from the databases, 8 (0.9%) studies, published between 2009 and 2021, were selected for narrative synthesis. The majority of these papers (7/8, 88%) investigated learning preferences, while only 1 (12%) paper studied learning strategies in HDS courses. The systematic review revealed that most HDS learners prefer visual presentations as their primary learning input. In terms of learning process and organization, they mostly tend to follow logical, linear, and sequential steps. Moreover, they focus more on abstract information, rather than detailed and concrete information. Regarding collaboration, HDS students sometimes prefer teamwork, and sometimes they prefer to work alone.
    CONCLUSIONS: The studies\' quality, assessed using the Mixed Methods Appraisal Tool, ranged between 73% and 100%, indicating excellent quality overall. However, the number of studies in this area is small, and the results of all studies are based on self-reported data. Therefore, more research needs to be conducted to provide insight into HDS education. We provide some suggestions, such as using learning analytics and educational data mining methods, for conducting future research to address gaps in the literature. We also discuss implications for HDS educators, and we make recommendations for HDS course design; for example, we recommend including visual materials, such as diagrams and videos, and offering step-by-step instructions for students.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    气候变化,当地流行病,未来的流行病,强迫流离失所在全球范围内构成重大公共卫生威胁。为了成功应对,人们和社区面临着发展对这些压力源的适应能力的挑战性任务。我们的观点是,包括人工智能在内的现代信息学技术的强大功能,生物医学和环境传感器,增强或虚拟现实,数据科学,和其他数字硬件或软件,有很大的推广潜力,维持,并支持人民和社区的韧性。然而,没有“一刀切”的弹性解决方案。解决方案必须与压力源的特定影响相匹配,文化维度,健康的社会决定因素,技术基础设施,和许多其他因素。
    Climate change, local epidemics, future pandemics, and forced displacements pose significant public health threats worldwide. To cope successfully, people and communities are faced with the challenging task of developing resilience to these stressors. Our viewpoint is that the powerful capabilities of modern informatics technologies including artificial intelligence, biomedical and environmental sensors, augmented or virtual reality, data science, and other digital hardware or software, have great potential to promote, sustain, and support resilience in people and communities. However, there is no \"one size fits all\" solution for resilience. Solutions must match the specific effects of the stressor, cultural dimensions, social determinants of health, technology infrastructure, and many other factors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目标:合作,分析,研究和审计(CARA)项目旨在提供基础设施,使爱尔兰全科医生(GP)能够使用其常规收集的患者管理软件(PMS)数据来更好地了解其患者人群,通过数据仪表板进行疾病管理和处方。本文介绍了CARA基础设施的设计和开发。
    方法:第一个示例性仪表板是与全科医生一起开发的,重点是抗生素处方,以开发和展示拟议的基础设施。数据集成过程涉及提取,将去识别的患者数据加载并转换为连接到交互式仪表板的数据模型,以便全科医生可视化,比较和审核他们的数据。
    结果:CARA基础架构的体系结构包括两个主要部分:摘录,加载和转换过程(ELT,将患者数据去识别为数据模型)和代表性状态转移应用程序编程接口(RESTAPI)(在CARA仪表板上提供数据模型及其可视化之间的安全屏障)。创建CARAconnect是为了便于从实践数据库中提取和取消识别患者数据。
    结论:CARA基础设施允许与爱尔兰一般实践中的主要PMS无缝连接和兼容性,并提供了可重复的模板来访问和可视化患者数据。CARA包括两个仪表板,实践概述和特定主题的仪表板(示例集中在抗生素处方上),其中包括一个审计工具,过滤器(实践中)和实践之间的比较。
    结论:CARA通过交互式数据仪表板为全科医生提供有价值的见解,以优化患者护理,从而支持基于证据的决策。确定潜在的改进领域,并根据其他实践对其性能进行基准测试。补充文件1.图形抽象。
    OBJECTIVE: Collaborate, Analyse, Research and Audit (CARA) project set out to provide an infrastructure to enable Irish general practitioners (GPs) to use their routinely collected patient management software (PMS) data to better understand their patient population, disease management and prescribing through data dashboards. This paper explains the design and development of the CARA infrastructure.
    METHODS: The first exemplar dashboard was developed with GPs and focused on antibiotic prescribing to develop and showcase the proposed infrastructure. The data integration process involved extracting, loading and transforming de-identified patient data into data models which connect to the interactive dashboards for GPs to visualise, compare and audit their data.
    RESULTS: The architecture of the CARA infrastructure includes two main sections: extract, load and transform process (ELT, de-identified patient data into data models) and a Representational State Transfer Application Programming Interface (REST API) (which provides the security barrier between the data models and their visualisation on the CARA dashboard). CARAconnect was created to facilitate the extraction and de-identification of patient data from the practice database.
    CONCLUSIONS: The CARA infrastructure allows seamless connectivity with and compatibility with the main PMS in Irish general practice and provides a reproducible template to access and visualise patient data. CARA includes two dashboards, a practice overview and a topic-specific dashboard (example focused on antibiotic prescribing), which includes an audit tool, filters (within practice) and between-practice comparisons.
    CONCLUSIONS: CARA supports evidence-based decision-making by providing GPs with valuable insights through interactive data dashboards to optimise patient care, identify potential areas for improvement and benchmark their performance against other practices.Supplementary file 1. Graphical abstract.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    从观察数据中推断因果的方法在人类疾病流行病学和社会科学中很常见,但在植物病理学中却很少使用。我们利用了2014年至2017年从俄勒冈州的院子里收集的带有白粉病的啤酒花植物(Podosphaeramacularis)的发病率的广泛数据集,以及有关种植者文化习俗的相关元数据。品种对白粉病的敏感性,和农药施用记录,以了解种植者杀菌剂使用的变化和原因以及相关成本。工具性因果森林模型确定了种植者的春季修剪彻底性,品种对两个主要致病品种的易感性。在5月至6月和6月至7月的时间转换期间,一个码的网络中心性,真菌的初始菌株是重要的变量,决定了种植者施用的农药活性成分的数量以及它们因白粉病而产生的相关成本。在协变量加权后拟合的暴露响应函数模型表明,施用的农药活性成分的数量及其相关成本与白粉病植物的季节性平均发病率成线性比例。虽然农药使用强度的原因是多方面的,生物和生产因素共同影响白粉病的发病率,这对种植者应用的农药活性成分的数量及其成本具有直接的暴露-反应关系。我们的分析指出了减少农药使用和管理啤酒花白粉病的成本的几种潜在策略。我们还强调了这些方法在观察性研究中用于因果推断的实用性。
    Methods for causal inference from observational data are common in human disease epidemiology and social sciences but are used relatively little in plant pathology. We draw upon an extensive data set of the incidence of hop plants with powdery mildew (Podosphaera macularis) collected from yards in Oregon during 2014 to 2017 and associated metadata on grower cultural practices, cultivar susceptibility to powdery mildew, and pesticide application records to understand variation in and causes of growers\' fungicide use and associated costs. An instrumental causal forest model identified growers\' spring pruning thoroughness, cultivar susceptibility to two of the dominant pathogenic races of P. macularis, network centrality of a yards during May-June and June-July time transitions, and the initial strain of the fungus were important variables determining the number of pesticide active constituents applied by growers and the associated costs they incurred in response to powdery mildew. Exposure-response function models fit after covariate weighting indicated both the number of pesticide active constituents applied and their associated costs scaled linearly with the seasonal mean incidence of plants with powdery mildew. While the causes of pesticide use intensity are multifaceted, biological and production factors collectively influence the incidence of powdery mildew, which has a direct exposure-response relationship on the number of pesticide active constituents that growers apply and their costs. Our analyses point to several potential strategies for reducing pesticide use and costs for management of powdery mildew on hop. We also highlight the utility of these methods for causal inference in observational studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    体力活动不足(PA)长期以来一直是一个全球性的健康问题,许多研究探索了PA的相关性,以确定不活跃生活方式的潜在机制。在文学中,已经确定了数十个不同的相关性(例如,个人,环境)水平,但是很少或没有直接的证据表明这些关联的相互关联。这项研究分析了44个与PA理论和经验相关的变量,以阐明与PA直接和间接相关的因素。
    19005名讲日语的成年人的横截面调查数据集(平均年龄=53.50岁,SD=17.40;9706名妇女)进行了分析。数据包括人口统计学和人体测量变量;自我报告的PA水平;感知的社会支持和环境(例如,对城市设施的认识);心理特征和健康行为特征(例如,个性,动机,自我效能感,决策平衡,改变战略的过程);和技术使用(例如,移动健康应用程序)。
    进行网络分析以选择变量之间有意义的关联(部分相关),确定了与PA直接正相关的九个变量:工作/就业状况,自我效能感,感知到的社会支持,内在动机,变化的阶段,计数器调节,自我重新评估,环境和技术使用。确定了40个(44个)变量的间接关联(两步邻域),这意味着大多数已知的PA相关因素至少间接地与PA相关。
    这些确定的关联反映了多层次视角在理解人们如何保持积极生活方式方面的重要性。对巴勒斯坦权力机构的干预可能有混合水平的目标,包括个体特征,社会支持以及物理和数字环境。
    UNASSIGNED: Insufficient physical activity (PA) has long been a global health issue, and a number of studies have explored correlates of PA to identify the mechanisms underlying inactive lifestyles. In the literature, dozens of correlates have been identified at different (eg, individual, environmental) levels, but there is little or no direct evidence for the mutual associations of these correlates. This study analysed 44 variables identified as theoretically and empirically relevant for PA to clarify the factors directly and indirectly associated with PA.
    UNASSIGNED: A cross-sectional survey dataset of 19 005 Japanese-speaking adults (mean age=53.50 years, SD=17.40; 9706 women) was analysed. The data encompassed demographic and anthropometric variables; self-reported PA levels; perceived social support and environments (eg, awareness of urban facilities for PA); psychological traits and health-behaviour characteristics (eg, personality, motivation, self-efficacy, decisional balance, process of change strategies); and technology use (eg, mobile health apps).
    UNASSIGNED: Network analyses were performed to select meaningful associations (partial correlations) among variables, which identified nine variables directly positively associated with PA: job/employment status, self-efficacy, perceived social support, intrinsic motivation, stage of change, counter conditioning, self-reevaluation, environment and technology use. Indirect associations (two-step neighbourhood) were identified for 40 (out of 44) variables, implying that most of the known PA-correlates are associated with PA-at least indirectly.
    UNASSIGNED: These identified associations echo the importance of the multilevel perspective in understanding how people maintain (in)active lifestyles. Interventions for PA could have mixed-level targets, including intraindividual characteristics, social support and physical and digital environments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:元数据描述并提供其他数据的上下文,在实现可查找性方面发挥着关键作用,可访问性,互操作性,和可重用性(FAIR)数据原则。通过提供全面且机器可读的数字资源描述,元数据使机器和人类用户能够无缝地发现,access,集成,并在不同的平台和应用程序中重用数据或内容。然而,人口健康数据的现有元数据的有限可访问性和机器可解释性阻碍了有效的数据发现和重用。
    目标:为了应对这些挑战,我们提出了一个使用标准化格式的综合框架,词汇,以及使人口健康数据机器可读的协议,显着提高他们的公平度并实现无缝发现,access,以及跨不同平台和研究应用的集成。
    方法:该框架实现了3阶段方法。第一阶段是数据文档计划(DDI)集成,这涉及利用DDI代码簿元数据以及数据和相关资产的详细信息文档,同时确保透明度和全面性。第二阶段是观察性医疗结果伙伴关系(OMOP)通用数据模型(CDM)标准化。在这个阶段,数据在OMOPCDM中得到协调和标准化,促进跨异构数据集的统一分析。第三阶段涉及Schema.org和JavaScript对象表示法(JSON-LD)的集成,其中使用Schema.org实体生成机器可读元数据,并使用JSON-LD嵌入数据中,提高机器和人类用户的可发现性和理解力。我们使用马拉维和肯尼亚的综合疾病监测和反应(IDSR)数据展示了这三个阶段的实施情况。
    结果:我们框架的实施显着提高了人口健康数据的公平性,通过与GoogleDatasetSearch等平台的无缝集成,提高了可发现性。采用标准化格式和协议简化了各种研究环境中的数据可访问性和集成,促进协作和知识共享。此外,使用机器可解释的元数据使研究人员能够有效地重用数据进行有针对性的分析和见解,从而最大限度地提高人口卫生资源的整体价值。JSON-LD代码可通过GitHub存储库访问,与JSON-LD集成的HTML代码可在实施网络上从研究实体网站共享人口信息。
    结论:采用机器可读的元数据标准对于确保人口健康数据的公平性至关重要。通过接受这些标准,组织可以增强不同资源的可见性,可访问性,和效用,带来更广泛的影响,特别是在低收入和中等收入国家。机器可读的元数据可以加速研究,改善医疗保健决策,并最终促进全球人口更好的健康结果。
    Metadata describe and provide context for other data, playing a pivotal role in enabling findability, accessibility, interoperability, and reusability (FAIR) data principles. By providing comprehensive and machine-readable descriptions of digital resources, metadata empower both machines and human users to seamlessly discover, access, integrate, and reuse data or content across diverse platforms and applications. However, the limited accessibility and machine-interpretability of existing metadata for population health data hinder effective data discovery and reuse.
    To address these challenges, we propose a comprehensive framework using standardized formats, vocabularies, and protocols to render population health data machine-readable, significantly enhancing their FAIRness and enabling seamless discovery, access, and integration across diverse platforms and research applications.
    The framework implements a 3-stage approach. The first stage is Data Documentation Initiative (DDI) integration, which involves leveraging the DDI Codebook metadata and documentation of detailed information for data and associated assets, while ensuring transparency and comprehensiveness. The second stage is Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standardization. In this stage, the data are harmonized and standardized into the OMOP CDM, facilitating unified analysis across heterogeneous data sets. The third stage involves the integration of Schema.org and JavaScript Object Notation for Linked Data (JSON-LD), in which machine-readable metadata are generated using Schema.org entities and embedded within the data using JSON-LD, boosting discoverability and comprehension for both machines and human users. We demonstrated the implementation of these 3 stages using the Integrated Disease Surveillance and Response (IDSR) data from Malawi and Kenya.
    The implementation of our framework significantly enhanced the FAIRness of population health data, resulting in improved discoverability through seamless integration with platforms such as Google Dataset Search. The adoption of standardized formats and protocols streamlined data accessibility and integration across various research environments, fostering collaboration and knowledge sharing. Additionally, the use of machine-interpretable metadata empowered researchers to efficiently reuse data for targeted analyses and insights, thereby maximizing the overall value of population health resources. The JSON-LD codes are accessible via a GitHub repository and the HTML code integrated with JSON-LD is available on the Implementation Network for Sharing Population Information from Research Entities website.
    The adoption of machine-readable metadata standards is essential for ensuring the FAIRness of population health data. By embracing these standards, organizations can enhance diverse resource visibility, accessibility, and utility, leading to a broader impact, particularly in low- and middle-income countries. Machine-readable metadata can accelerate research, improve health care decision-making, and ultimately promote better health outcomes for populations worldwide.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Editorial
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:退伍军人事务外科质量改进计划(VASQIP)对每个退伍军人事务(VA)医院的外科质量护士(SQN)进行培训,以从所有心脏外科病例的医疗记录中提取或验证187个变量。对于十个术前实验室值,VASQIP具有半自动(SA)系统,在该系统中自动提取本地实验室值,由SQN验证,并手动提取在其他VA设施记录的实验室值。这项研究的目的是开发和验证一种方法来自动提取这十个术前实验室值,并将结果与当前的SA方法进行比较。
    方法:我们开发了使用逻辑观察标识符名称和代码从VA公司数据仓库中提取十个术前实验室值和测量日期的方法。自动(A)和SA信息提取在协议方面进行了比较,符合数据定义,接近手术,和不幸。
    结果:对于具有A和SA实验室值的手术,十个变量的组内相关系数在0.90到0.98之间。对于几个变量,A方法导致数据丢失率低得多(例如,高密度脂蛋白的缺失数据为2.4%和22.5%),并消除了超出日期范围的条目。
    结论:尽管SQN提取的数据被广泛认为是国家外科质量改进计划的金标准,完全自动化提取实验室值可能有优势,包括与SASQN提取或验证值的高度一致性,以及较低的误差率和超出日期的数据。
    BACKGROUND: The Veterans Affairs Surgical Quality Improvement Program (VASQIP) trains surgical quality nurses (SQNs) at each Veterans Affairs (VA) hospital to extract or verify 187 variables from the medical record for all cardiac surgical cases. For ten preoperative laboratory values, VASQIP has a semiautomated (SA) system in which local lab values are automatically extracted, verified by SQNs, and lab values recorded at other VA facilities are manually extracted. The objective of this study was to develop and validate a method to automate the extraction of these ten preoperative laboratory values and compare results with the current SA method.
    METHODS: We developed methods to extract ten preoperative laboratory values and measurement dates from the VA Corporate Data Warehouse using Logical Observation Identifiers Names and Codes. Automated (A) versus SA information extraction was compared in terms of agreement, conformance to data definitions, proximity to surgery, and missingness.
    RESULTS: For surgeries with both A and SA lab values, the intraclass correlation coefficients for the ten variables ranged from 0.90 to 0.98. For several variables, the A method resulted in much lower rates of missing data (e.g., 2.4% versus 22.5% missing data for high-density lipoprotein) and eliminated out-of-date-range entries.
    CONCLUSIONS: Although SQN-extracted data are widely considered the gold standard within National Surgical Quality Improvement Programs, there may be advantages to fully automating extraction of lab values, including high congruence with SA SQN-extracted or verified values and lower rates of missingness and out-of-date-range data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:数据缺失对个体连续血糖监测(CGM)数据的影响未知,但会影响患者的临床决策。
    目的:我们旨在研究数据丢失对来自连续血糖监测仪的个体患者血糖指标的影响,并评估其对临床决策的影响。
    方法:使用FreeStyleLibre传感器(雅培糖尿病护理)收集1型和2型糖尿病患者的CGM数据。我们从每个患者中选择了7-28天的24小时连续数据,没有任何缺失值。为了模拟真实世界的数据丢失,从5%到50%的缺失数据被引入到数据集中.从这个修改的数据集中,临床指标,包括低于范围的时间(TBR),TBR等级2(TBR2),和其他常见的血糖指标在有和没有数据丢失的数据集中计算。由于数据丢失而导致血糖指标相关偏差的记录,根据临床专家的判断,被定义为专家面板边界误差(εEPB)。这些误差表示为记录总数的百分比。研究了葡萄糖管理指标<53mmol/mol的记录错误。
    结果:共有84名患者在28天内完成了798次记录。5%-50%的数据丢失7-28天的记录,对于TBR,εEPB从798(0.0%)中的0到736(20.0%)中的147,而对于TBR2,从612(0.0%)中的0到408(5.4%)中的22。在14天录音的情况下,由于786例中的2例(0.3%)和522例中的32例(6.1%)的数据丢失,TBR和TBR2发作完全消失,分别。然而,消失的TBR和TBR2的初始值相对较小(<0.1%)。在葡萄糖管理指标<53mmol/mol的记录中,εEPB为9.6%持续14天,数据损失为30%。
    结论:在14天的CGM记录中,数据丢失最多30%,缺失数据对各种血糖指标的临床解释影响最小.
    背景:ClinicalTrials.govNCT05584293;https://clinicaltrials.gov/study/NCT05584293。
    BACKGROUND: The impact of missing data on individual continuous glucose monitoring (CGM) data is unknown but can influence clinical decision-making for patients.
    OBJECTIVE: We aimed to investigate the consequences of data loss on glucose metrics in individual patient recordings from continuous glucose monitors and assess its implications on clinical decision-making.
    METHODS: The CGM data were collected from patients with type 1 and 2 diabetes using the FreeStyle Libre sensor (Abbott Diabetes Care). We selected 7-28 days of 24 hours of continuous data without any missing values from each individual patient. To mimic real-world data loss, missing data ranging from 5% to 50% were introduced into the data set. From this modified data set, clinical metrics including time below range (TBR), TBR level 2 (TBR2), and other common glucose metrics were calculated in the data sets with and that without data loss. Recordings in which glucose metrics deviated relevantly due to data loss, as determined by clinical experts, were defined as expert panel boundary error (εEPB). These errors were expressed as a percentage of the total number of recordings. The errors for the recordings with glucose management indicator <53 mmol/mol were investigated.
    RESULTS: A total of 84 patients contributed to 798 recordings over 28 days. With 5%-50% data loss for 7-28 days recordings, the εEPB varied from 0 out of 798 (0.0%) to 147 out of 736 (20.0%) for TBR and 0 out of 612 (0.0%) to 22 out of 408 (5.4%) recordings for TBR2. In the case of 14-day recordings, TBR and TBR2 episodes completely disappeared due to 30% data loss in 2 out of 786 (0.3%) and 32 out of 522 (6.1%) of the cases, respectively. However, the initial values of the disappeared TBR and TBR2 were relatively small (<0.1%). In the recordings with glucose management indicator <53 mmol/mol the εEPB was 9.6% for 14 days with 30% data loss.
    CONCLUSIONS: With a maximum of 30% data loss in 14-day CGM recordings, there is minimal impact of missing data on the clinical interpretation of various glucose metrics.
    BACKGROUND: ClinicalTrials.gov NCT05584293; https://clinicaltrials.gov/study/NCT05584293.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:尽管电子医疗记录(EHR)数据的可用性越来越高,并且即插即用机器学习(ML)应用编程接口的广泛可用性,到目前为止,在常规医院工作流程中采用数据驱动的决策,仍然有限。通过按年龄推导诊断集群的镜头,本研究调查了可以使用EHR数据进行ML分析的类型,以及如何将结果传达给相关利益相关者.
    方法:来自三级儿科医院的观察性EHR数据,使用了61522例独特患者和3315例独特ICD-10诊断代码,预处理后。K均值聚类用于识别患者诊断的年龄分布。使用定量度量和专家评估聚类的临床有效性来选择最终模型。此外,分析了预处理决策的不确定性。
    结果:确定了四个年龄簇的疾病,大致与年龄在0和1之间;1和5;5和13;13和18。诊断,在集群内,与现有的关于不同年龄的演讲倾向的知识相一致,和序贯群集呈现已知的疾病进展。结果验证了文献中的类似方法。预处理决策引起的不确定性的影响在个体诊断中很大,但在人群水平上却没有。缓解战略,或沟通,这种不确定性得到了成功的证明。
    结论:无监督ML应用于EHR数据可识别诊断的临床相关年龄分布,这可以增强现有决策。然而,如果没有适当地减轻或传达,医疗保健数据集中的偏见会极大地影响结果。
    BACKGROUND: Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders.
    METHODS: Observational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed.
    RESULTS: Four age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated.
    CONCLUSIONS: Unsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号