Explainability

可解释性
  • 文章类型: Journal Article
    肌萎缩侧索硬化症(ALS)是一种进行性神经退行性疾病,严重影响受影响的人的言语和运动功能,然而早期发现和追踪疾病进展仍然具有挑战性.当前监测ALS进展的黄金标准,ALS功能评定量表-修订(ALSFRS-R),基于症状严重程度的主观评分,由于缺乏粒度,可能无法捕获细微但有临床意义的变化。可以远程自动从患者那里收集的多模态语音测量使我们能够弥合这一差距,因为它们具有连续的价值,因此,在捕捉疾病进展方面可能更有颗粒。在这里,我们研究了通过远程患者监测平台收集的ALS(pALS)患者的多模态语音测量的响应性和敏感性,以量化检测与疾病进展相关的临床意义变化所需的时间。我们记录了278名参与者的音频和视频,并自动提取了多模态语音生物标志物(声学,口面,语言)来自数据。我们发现,pALS语音相对于同一提示的规范启发的时间对齐以及用于描述图片的单词数量是检测到pALS在延髓(n=36)和非延髓发作(n=107)中的这种变化的最敏感措施。有趣的是,这些措施的反应是稳定的,即使在小样本量。我们进一步发现,即使没有患者报告的临床变化,某些语音测量也足以跟踪延髓下降。即,ALSFRS-R语音得分在总可能得分4中的3处保持不变。这项研究的结果有可能促进改进,加速和具有成本效益的临床试验和护理。
    Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease that severely impacts affected persons\' speech and motor functions, yet early detection and tracking of disease progression remain challenging. The current gold standard for monitoring ALS progression, the ALS functional rating scale - revised (ALSFRS-R), is based on subjective ratings of symptom severity, and may not capture subtle but clinically meaningful changes due to a lack of granularity. Multimodal speech measures which can be automatically collected from patients in a remote fashion allow us to bridge this gap because they are continuous-valued and therefore, potentially more granular at capturing disease progression. Here we investigate the responsiveness and sensitivity of multimodal speech measures in persons with ALS (pALS) collected via a remote patient monitoring platform in an effort to quantify how long it takes to detect a clinically-meaningful change associated with disease progression. We recorded audio and video from 278 participants and automatically extracted multimodal speech biomarkers (acoustic, orofacial, linguistic) from the data. We find that the timing alignment of pALS speech relative to a canonical elicitation of the same prompt and the number of words used to describe a picture are the most responsive measures at detecting such change in both pALS with bulbar (n = 36) and non-bulbar onset (n = 107). Interestingly, the responsiveness of these measures is stable even at small sample sizes. We further found that certain speech measures are sensitive enough to track bulbar decline even when there is no patient-reported clinical change, i.e. the ALSFRS-R speech score remains unchanged at 3 out of a total possible score of 4. The findings of this study have the potential to facilitate improved, accelerated and cost-effective clinical trials and care.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:疼痛是一种复杂的主观体验,强烈影响健康和生活质量。尽管许多人试图找到有效的解决方案,目前的治疗方法是通用的,往往不成功,并表现出明显的副作用。设计个性化疗法需要了解多维疼痛体验,考虑身体和情感方面。目前的临床疼痛评估,依靠主观的一维数字自我报告,无法捕捉到这种复杂性。
    方法:为此,我们利用机器学习来解开塑造疼痛体验的生理和心理因素。临床,社会心理,我们收集了118例慢性疼痛和40例疼痛试验(4,697项试验)健康参与者的生理数据.
    结果:为了了解对伤害性感受的客观反应,我们从生理信号中分类疼痛(准确度>0.87),提取最重要的生物标志物。然后,使用多级混合效应模型,我们预测了报告的疼痛,量化主观水平和测量的生理反应之间的不匹配。从这些模型中,我们引入了两个指标:TIP(主观疼痛指数)和Φ(生理指数)。这些代表了临床过程中可能的附加值,捕捉心理社会和生理疼痛维度,分别。高TIP患者的特点是频繁的工作病假和增加临床抑郁和焦虑,与长期残疾和康复不良相关的因素,并用于替代治疗,比如心理上的。相比之下,高Φ患者表现出强烈的伤害性疼痛成分,可以从药物治疗中获益更多。
    结论:TIP和Φ,解释疼痛的多维性,可能提供一种可能导致靶向治疗的新工具,从而降低低效通用疗法的成本。
    背景:RESC-PainSense,SNSF-MOVE-IT197271。
    BACKGROUND: Pain is a complex subjective experience, strongly impacting health and quality of life. Despite many attempts to find effective solutions, present treatments are generic, often unsuccessful, and present significant side effects. Designing individualized therapies requires understanding of multidimensional pain experience, considering physical and emotional aspects. Current clinical pain assessments, relying on subjective one-dimensional numeric self-reports, fail to capture this complexity.
    METHODS: To this aim, we exploited machine learning to disentangle physiological and psychosocial components shaping the pain experience. Clinical, psychosocial, and physiological data were collected from 118 chronic pain and healthy participants undergoing 40 pain trials (4,697 trials).
    RESULTS: To understand the objective response to nociception, we classified pain from the physiological signals (accuracy >0.87), extracting the most important biomarkers. Then, using multilevel mixed-effects models, we predicted the reported pain, quantifying the mismatch between subjective level and measured physiological response. From these models, we introduced two metrics: TIP (subjective index of pain) and Φ (physiological index). These represent possible added value in the clinical process, capturing psychosocial and physiological pain dimensions, respectively. Patients with high TIP are characterized by frequent sick leave from work and increased clinical depression and anxiety, factors associated with long-term disability and poor recovery, and are indicated for alternative treatments, such as psychological ones. By contrast, patients with high Φ show strong nociceptive pain components and could benefit more from pharmacotherapy.
    CONCLUSIONS: TIP and Φ, explaining the multidimensionality of pain, might provide a new tool potentially leading to targeted treatments, thereby reducing the costs of inefficient generic therapies.
    BACKGROUND: RESC-PainSense, SNSF-MOVE-IT197271.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    对网络组件之间的动态交互进行建模对于揭示复杂网络的演化机制至关重要。最近,时空图学习方法在表征节点间关系(INR)的动态变化方面取得了值得注意的成果。然而,挑战依然存在:INR的空间邻域开发不足,INRs动态变化中的时空依赖性被忽视,忽略了历史状态和地方信息的影响。此外,该模型的可解释性一直没有得到充分研究。为了解决这些问题,我们提出了一个可解释的时空图进化学习(ESTGEL)模型来对INR的动态演化进行建模。具体来说,提出了一种边缘注意模块,以在多级上利用INR的空间邻域,即,通过分解初始节点关系图得出的嵌套子图的层次结构。随后,提出了一个动态关系学习模块来捕获INR的时空依赖性。然后将INR用作相邻信息以改善节点表示,从而全面描绘了网络的动态演变。最后,该方法得到了大脑发育研究的真实数据的验证。动态脑网络分析的实验结果表明,在整个开发过程中,脑功能网络从分散过渡到更收敛和模块化的结构。在与包括情绪控制在内的功能相关的动态功能连接(dFC)中观察到显着变化,决策,和语言处理。
    Modeling dynamic interactions among network components is crucial to uncovering the evolution mechanisms of complex networks. Recently, spatio-temporal graph learning methods have achieved noteworthy results in characterizing the dynamic changes of inter-node relations (INRs). However, challenges remain: The spatial neighborhood of an INR is underexploited, and the spatio-temporal dependencies in INRs\' dynamic changes are overlooked, ignoring the influence of historical states and local information. In addition, the model\'s explainability has been understudied. To address these issues, we propose an explainable spatio-temporal graph evolution learning (ESTGEL) model to model the dynamic evolution of INRs. Specifically, an edge attention module is proposed to utilize the spatial neighborhood of an INR at multi-level, i.e., a hierarchy of nested subgraphs derived from decomposing the initial node-relation graph. Subsequently, a dynamic relation learning module is proposed to capture the spatio-temporal dependencies of INRs. The INRs are then used as adjacent information to improve the node representation, resulting in comprehensive delineation of dynamic evolution of the network. Finally, the approach is validated with real data on brain development study. Experimental results on dynamic brain networks analysis reveal that brain functional networks transition from dispersed to more convergent and modular structures throughout development. Significant changes are observed in the dynamic functional connectivity (dFC) associated with functions including emotional control, decision-making, and language processing.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在越来越多的工业和技术过程中,基于机器学习的系统正被赋予监督任务。虽然它们已经成功地应用在许多应用领域,他们经常不能概括观察到的数据的变化,可能会导致环境变化或传感器退化。这些变化,通常被称为概念漂移可以在使用的解决方案中触发故障,这些解决方案在许多情况下是安全关键的。因此,在构建可靠且稳健的机器学习驱动解决方案时,检测和分析概念漂移是至关重要的一步。在这项工作中,我们考虑无监督数据流的设置,这与不同的监控和异常检测场景高度相关。特别是,我们专注于本地化和解释概念漂移的任务,这对于使人类操作员采取适当的行动至关重要。接下来提供概念漂移本地化问题的精确数学定义,我们调查了关于这个主题的文献。通过对参数人工数据集进行标准化实验,我们提供了不同策略的直接比较。因此,我们可以系统地分析不同方案的性质,并为实际应用提出第一准则。最后,我们探索解释概念漂移的新兴主题。
    In an increasing number of industrial and technical processes, machine learning-based systems are being entrusted with supervision tasks. While they have been successfully utilized in many application areas, they frequently are not able to generalize to changes in the observed data, which environmental changes or degrading sensors might cause. These changes, commonly referred to as concept drift can trigger malfunctions in the used solutions which are safety-critical in many cases. Thus, detecting and analyzing concept drift is a crucial step when building reliable and robust machine learning-driven solutions. In this work, we consider the setting of unsupervised data streams which is highly relevant for different monitoring and anomaly detection scenarios. In particular, we focus on the tasks of localizing and explaining concept drift which are crucial to enable human operators to take appropriate action. Next to providing precise mathematical definitions of the problem of concept drift localization, we survey the body of literature on this topic. By performing standardized experiments on parametric artificial datasets we provide a direct comparison of different strategies. Thereby, we can systematically analyze the properties of different schemes and suggest first guidelines for practical applications. Finally, we explore the emerging topic of explaining concept drift.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本文探讨了医学细胞学中的以人为中心的人工智能(HCAI),重点是加强与人工智能的互动。它提出了一种Human-AI交互范式,强调AI系统的可解释性和用户控制。它是基于三种交互策略的迭代协商过程,旨在(i)通过迭代步骤(迭代探索)阐述系统结果,(Ii)解释人工智能系统的行为或决定(澄清),(iii)允许非专家用户触发对AI模型的简单重新训练(重新配置)。在重新设计现有的基于AI的工具以对鼻粘膜进行微观分析时,利用了这种相互作用范式。用鼻细胞学家测试所得的工具。本文讨论了对进行的评估结果的分析,并概述了与医学AI相关的经验教训。
    This article explores Human-Centered Artificial Intelligence (HCAI) in medical cytology, with a focus on enhancing the interaction with AI. It presents a Human-AI interaction paradigm that emphasizes explainability and user control of AI systems. It is an iterative negotiation process based on three interaction strategies aimed to (i) elaborate the system outcomes through iterative steps (Iterative Exploration), (ii) explain the AI system\'s behavior or decisions (Clarification), and (iii) allow non-expert users to trigger simple retraining of the AI model (Reconfiguration). This interaction paradigm is exploited in the redesign of an existing AI-based tool for microscopic analysis of the nasal mucosa. The resulting tool is tested with rhinocytologists. The article discusses the analysis of the results of the conducted evaluation and outlines lessons learned that are relevant for AI in medicine.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:这项研究提出了一种检测骨髓水肿(BME)切片的方法,axSpA的典型发现,使用MRI扫描作为输入。此过程不需要手动输入ROI,并提供判断切片上是否存在BME的结果以及水肿的位置作为判断的依据。
    方法:首先,将骶髂关节MRI扫描的信号强度标准化,以减少扫描之间信号值的变化.接下来,使用切片选择网络提取包含滑膜关节的切片。最后,BME切片检测网络确定每个切片中是否存在BME,并输出BME的位置。
    结果:将提出的方法应用于从日本15家医院收集的86次MRI扫描。结果表明,对于滑膜关节范围的上下切片之间的错位,切片选择过程的平均绝对误差为1.49切片。准确性,灵敏度,BME切片检测网络的特异性分别为0.905、0.532和0.974。
    结论:本文提出了一种使用BME检测切片及其位置的方法,作为从MRI扫描中判断的基本原理,并使用86次MRI扫描显示了其有效性。在未来,我们计划开发一种检测其他发现的方法,例如MR扫描中的骨侵蚀,其次是诊断支持系统的开发。
    OBJECTIVE: This study proposes a process for detecting slices with bone marrow edema (BME), a typical finding of axSpA, using MRI scans as the input. This process does not require manual input of ROIs and provides the results of the judgment of the presence or absence of BME on a slice and the location of edema as the rationale for the judgment.
    METHODS: First, the signal intensity of the MRI scans of the sacroiliac joint was normalized to reduce the variation in signal values between scans. Next, slices containing synovial joints were extracted using a slice selection network. Finally, the BME slice detection network determines the presence or absence of the BME in each slice and outputs the location of the BME.
    RESULTS: The proposed method was applied to 86 MRI scans collected from 15 hospitals in Japan. The results showed that the average absolute error of the slice selection process was 1.49 slices for the misalignment between the upper and lower slices of the synovial joint range. The accuracy, sensitivity, and specificity of the BME slice detection network were 0.905, 0.532, and 0.974, respectively.
    CONCLUSIONS: This paper proposes a process to detect the slice with BME and its location as the rationale of the judgment from an MRI scan and shows its effectiveness using 86 MRI scans. In the future, we plan to develop a process for detecting other findings such as bone erosion from MR scans, followed by the development of a diagnostic support system.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    深度学习方法最近在从心电图(ECG)波形检测左心室收缩功能障碍(LVSD)方面取得了成功。尽管他们的准确度很高,它们很难在临床环境中广泛解释和应用。在这项研究中,我们着手确定基于标准ECG测量的更简单的模型是否能够以与深度学习模型相似的准确度检测LVSD.
    使用40994个匹配的12导联心电图和经胸超声心动图的观察数据集,我们训练了一系列复杂度越来越高的模型,以基于ECG波形和导出的测量值检测LVSD.训练数据是从斯坦福大学医学中心获得的。外部验证数据从哥伦比亚医学中心和英国生物库获得。斯坦福数据集包括40994个匹配的心电图和超声心动图,其中9.72%有LVSD。使用555离散的随机森林模型,自动测量获得了0.92(0.91-0.93)的接收器操作特性曲线下的面积(AUC),类似于AUC为0.94(0.93-0.94)的深度学习波形模型。基于五个测量的逻辑回归模型实现了高性能[AUC为0.86(0.85-0.87)],接近深度学习模型,优于N末端激素原脑钠肽(NT-proBNP)。最后,我们发现更简单的模型更易于跨站点移植,有两个独立的实验,外部网站。
    我们的研究证明了简单心电图模型的价值,这些模型的性能几乎与深度学习模型相同。同时更容易实现和解释。
    UNASSIGNED: Deep learning methods have recently gained success in detecting left ventricular systolic dysfunction (LVSD) from electrocardiogram (ECG) waveforms. Despite their high level of accuracy, they are difficult to interpret and deploy broadly in the clinical setting. In this study, we set out to determine whether simpler models based on standard ECG measurements could detect LVSD with similar accuracy to that of deep learning models.
    UNASSIGNED: Using an observational data set of 40 994 matched 12-lead ECGs and transthoracic echocardiograms, we trained a range of models with increasing complexity to detect LVSD based on ECG waveforms and derived measurements. The training data were acquired from the Stanford University Medical Center. External validation data were acquired from the Columbia Medical Center and the UK Biobank. The Stanford data set consisted of 40 994 matched ECGs and echocardiograms, of which 9.72% had LVSD. A random forest model using 555 discrete, automated measurements achieved an area under the receiver operator characteristic curve (AUC) of 0.92 (0.91-0.93), similar to a deep learning waveform model with an AUC of 0.94 (0.93-0.94). A logistic regression model based on five measurements achieved high performance [AUC of 0.86 (0.85-0.87)], close to a deep learning model and better than N-terminal prohormone brain natriuretic peptide (NT-proBNP). Finally, we found that simpler models were more portable across sites, with experiments at two independent, external sites.
    UNASSIGNED: Our study demonstrates the value of simple electrocardiographic models that perform nearly as well as deep learning models, while being much easier to implement and interpret.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Iconography通过考虑艺术品中描绘的主题及其表现来研究艺术品的视觉内容。计算机视觉已用于识别绘画中的图像主题,卷积神经网络使基督教艺术绘画中的人物能够有效分类。然而,如果CNN获得的分类结果依赖于人类专家在研究图像时利用的相同图像特性,并且如果可以利用在整个艺术品图像上训练的分类器的体系结构来支持更艰巨的目标检测任务,则仍然需要证明。一种通过神经模型揭示分类过程的合适方法依赖于类激活图,强调图像对分类贡献最大的区域。这项工作比较了最先进的算法(CAM,Grad-CAM,Grad-CAM++,和SmoothGrad-CAM++)在识别图像属性的能力方面,这些属性决定了基督教艺术绘画中人物的分类。定量和定性分析表明,Grad-CAM,Grad-CAM++,和平滑Grad-CAM++具有相似的性能,而CAM具有较低的功效。平滑的Grad-CAM++隔离了多个断开的图像区域,可以很好地识别小的图标符号。Grad-CAM产生更宽,更连续的区域,更好地覆盖大型图标符号。CAM算法计算的显著图像区域已用于估计对象级边界框,定量分析表明,用Grad-CAM估计的框平均IoU达到55%,61%的GT已知本地化和31%的mAP。所获得的结果是朝着计算机辅助研究肖像元素定位和艺术品中相互关系的变化迈出的一步,并为自动创建边界框以训练基督教艺术图像中的肖像符号检测器开辟了道路。
    Iconography studies the visual content of artworks by considering the themes portrayed in them and their representation. Computer Vision has been used to identify iconographic subjects in paintings and Convolutional Neural Networks enabled the effective classification of characters in Christian art paintings. However, it still has to be demonstrated if the classification results obtained by CNNs rely on the same iconographic properties that human experts exploit when studying iconography and if the architecture of a classifier trained on whole artwork images can be exploited to support the much harder task of object detection. A suitable approach for exposing the process of classification by neural models relies on Class Activation Maps, which emphasize the areas of an image contributing the most to the classification. This work compares state-of-the-art algorithms (CAM, Grad-CAM, Grad-CAM++, and Smooth Grad-CAM++) in terms of their capacity of identifying the iconographic attributes that determine the classification of characters in Christian art paintings. Quantitative and qualitative analyses show that Grad-CAM, Grad-CAM++, and Smooth Grad-CAM++ have similar performances while CAM has lower efficacy. Smooth Grad-CAM++ isolates multiple disconnected image regions that identify small iconographic symbols well. Grad-CAM produces wider and more contiguous areas that cover large iconographic symbols better. The salient image areas computed by the CAM algorithms have been used to estimate object-level bounding boxes and a quantitative analysis shows that the boxes estimated with Grad-CAM reach 55% average IoU, 61% GT-known localization and 31% mAP. The obtained results are a step towards the computer-aided study of the variations of iconographic elements positioning and mutual relations in artworks and open the way to the automatic creation of bounding boxes for training detectors of iconographic symbols in Christian art images.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    -业务对算法的依赖正变得无处不在,公司越来越担心他们的算法会造成重大的财务或声誉损害。备受瞩目的案例包括谷歌的人工智能算法在2015年错误地将一对黑人夫妇标记为大猩猩(Gebru2020在牛津人工智能伦理手册中,pp.251-269),微软的AI聊天机器人Tay传播种族主义,推特上的性别歧视和反犹太言论(现为X)(Wolf等人。2017ACMSigcasComput。Soc.47,54-64(doi:10.1145/3144592.3144598)),亚马逊的人工智能招聘工具在显示出对女性的偏见后被废弃。作为回应,政府正在立法和实施禁令,监管机构对公司和司法机构进行罚款,讨论可能使算法在法律上人为“人”。与财务审计一样,政府,商业和社会将需要算法审计;正式保证算法是合法的,道德和安全。设想了一个新的行业:算法的审计和保证(参见数据隐私),具有使AI专业化和产业化的职权范围,ML和相关算法。利益相关者从从事政策/法规工作的人到行业从业者和开发人员。Wealsoexpectedthenatureandscopeoftheauditlevelsandframeworkpresentedwillinformthoseinterestedinsystemsofgovernanceandcompliancewithregulations/standards.我们在本文中的目标是调查执行审计和保证所必需的关键领域,并在这个新颖的研究和实践领域引发辩论。
    -Business reliance on algorithms is becoming ubiquitous, and companies are increasingly concerned about their algorithms causing major financial or reputational damage. High-profile cases include Google\'s AI algorithm for photo classification mistakenly labelling a black couple as gorillas in 2015 (Gebru 2020 In The Oxford handbook of ethics of AI, pp. 251-269), Microsoft\'s AI chatbot Tay that spread racist, sexist and antisemitic speech on Twitter (now X) (Wolf et al. 2017 ACM Sigcas Comput. Soc. 47, 54-64 (doi:10.1145/3144592.3144598)), and Amazon\'s AI recruiting tool being scrapped after showing bias against women. In response, governments are legislating and imposing bans, regulators fining companies and the judiciary discussing potentially making algorithms artificial \'persons\' in law. As with financial audits, governments, business and society will require algorithm audits; formal assurance that algorithms are legal, ethical and safe. A new industry is envisaged: Auditing and Assurance of Algorithms (cf. data privacy), with the remit to professionalize and industrialize AI, ML and associated algorithms. The stakeholders range from those working on policy/regulation to industry practitioners and developers. We also anticipate the nature and scope of the auditing levels and framework presented will inform those interested in systems of governance and compliance with regulation/standards. Our goal in this article is to survey the key areas necessary to perform auditing and assurance and instigate the debate in this novel area of research and practice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    低血糖是一种常见的代谢紊乱,发生在新生儿期。早期识别有低血糖风险的新生儿可以优化新生儿护理的治疗策略。这项研究旨在开发机器学习模型并实施预测性应用程序,以协助临床医生在出生后四小时内准确预测新生儿低血糖的风险。我们的回顾性研究分析了在2011年1月1日至2021年8月31日期间出生≥35周胎龄并进入婴儿托儿所的新生儿的数据。我们从台湾南部的三级医疗中心收集了2687名新生儿的电子病历。使用12个临床相关特征,我们评估了九种机器学习方法来构建预测模型。我们选择了接收器工作特征曲线(AUC)下面积最大的模型,以集成到我们的医院信息系统(HIS)中。Stacking早期新生儿低血糖预测模型的前3个AUC值分别为0.739,随机森林为0.732,投票为0.732。随机森林被认为是最好的模型,因为它具有相对较高的AUC,并且没有显示出明显的过拟合(精度为0.658,灵敏度为0.682,特异性为0.649,F1评分为0.517,精度为0.417)。最佳模型已集成到基于Web的应用程序中,该应用程序集成到医院信息系统中。Shapley添加剂解释(SHAP)值表示输送模式,胎龄,多重奇偶校验,呼吸窘迫,出生体重<2500gm是新生儿低血糖的五大预测因子。我们的机器学习模型的实施提供了一种有效的工具,可以帮助临床医生准确识别早期新生儿低血糖的风险新生儿,从而允许及时的干预和治疗。
    Hypoglycemia is a common metabolic disorder that occurs in the neonatal period. Early identification of neonates at risk of developing hypoglycemia can optimize therapeutic strategies in neonatal care. This study aims to develop a machine learning model and implement a predictive application to assist clinicians in accurately predicting the risk of neonatal hypoglycemia within four hours after birth. Our retrospective study analyzed data from neonates born ≥35 weeks gestational age and admitted to the well-baby nursery between 1 January 2011 and 31 August 2021. We collected electronic medical records of 2687 neonates from a tertiary medical center in Southern Taiwan. Using 12 clinically relevant features, we evaluated nine machine learning approaches to build the predictive models. We selected the models with the highest area under the receiver operating characteristic curve (AUC) for integration into our hospital information system (HIS). The top three AUC values for the early neonatal hypoglycemia prediction models were 0.739 for Stacking, 0.732 for Random Forest and 0.732 for Voting. Random Forest is considered the best model because it has a relatively high AUC and shows no significant overfitting (accuracy of 0.658, sensitivity of 0.682, specificity of 0.649, F1 score of 0.517 and precision of 0.417). The best model was incorporated in the web-based application integrated into the hospital information system. Shapley Additive Explanation (SHAP) values indicated mode of delivery, gestational age, multiparity, respiratory distress, and birth weight < 2500 gm as the top five predictors of neonatal hypoglycemia. The implementation of our machine learning model provides an effective tool that assists clinicians in accurately identifying at-risk neonates for early neonatal hypoglycemia, thereby allowing timely interventions and treatments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号