Explainability

可解释性
  • 文章类型: Journal Article
    产前酒精暴露(PAE)是指由于怀孕期间饮酒而暴露于发育中的胎儿,并可能对学习产生终身影响,行为,和健康。了解PAE对发育中的大脑的影响由于其复杂的结构和功能属性而表现出挑战。这可以通过利用机器学习(ML)和深度学习(DL)方法来解决。虽然大多数ML和DL模型都是针对以成人为中心的问题量身定制的,这项工作的重点是应用DL检测儿科人群中的PAE.这项研究整合了预先训练的简单全卷积网络(SFCN)作为一种用于提取特征的迁移学习方法,以及一种新训练的分类器,用于根据2-8岁个体的T1加权结构脑磁共振(MR)扫描来区分未暴露和PAE参与者。在训练过程中几个不同的数据集大小和增强策略中,当考虑对两个类别都有增强的平衡数据集时,分类器在测试数据上获得了88.47%的最高灵敏度和85.04%的平均准确度.此外,我们还使用Grad-CAM方法初步进行了可解释性分析,突出大脑的各个区域,如call体,小脑,pons,白质是模型决策过程中最重要的特征。尽管由于大脑的快速发展,为儿科人群构建DL模型面临挑战,运动伪影,数据不足,这项工作突出了迁移学习在数据有限的情况下的潜力。此外,这项研究强调了保持平衡的数据集对公平分类的重要性,并阐明了使用可解释性分析进行模型预测的基本原理。
    Prenatal alcohol exposure (PAE) refers to the exposure of the developing fetus due to alcohol consumption during pregnancy and can have life-long consequences for learning, behavior, and health. Understanding the impact of PAE on the developing brain manifests challenges due to its complex structural and functional attributes, which can be addressed by leveraging machine learning (ML) and deep learning (DL) approaches. While most ML and DL models have been tailored for adult-centric problems, this work focuses on applying DL to detect PAE in the pediatric population. This study integrates the pre-trained simple fully convolutional network (SFCN) as a transfer learning approach for extracting features and a newly trained classifier to distinguish between unexposed and PAE participants based on T1-weighted structural brain magnetic resonance (MR) scans of individuals aged 2-8 years. Among several varying dataset sizes and augmentation strategy during training, the classifier secured the highest sensitivity of 88.47% with 85.04% average accuracy on testing data when considering a balanced dataset with augmentation for both classes. Moreover, we also preliminarily performed explainability analysis using the Grad-CAM method, highlighting various brain regions such as corpus callosum, cerebellum, pons, and white matter as the most important features in the model\'s decision-making process. Despite the challenges of constructing DL models for pediatric populations due to the brain\'s rapid development, motion artifacts, and insufficient data, this work highlights the potential of transfer learning in situations where data is limited. Furthermore, this study underscores the importance of preserving a balanced dataset for fair classification and clarifying the rationale behind the model\'s prediction using explainability analysis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    社交平台的用户通常将这些网站视为发布心理健康问题的支持空间。这些对话包含有关个人健康风险的重要痕迹。最近,研究人员利用这些在线信息来构建心理健康检测模型,旨在识别Twitter等平台上面临风险的用户,Reddit或Facebook。这些模型中的大多数都专注于实现良好的分类结果,忽略了决策的可解释性和可解释性。最近的研究指出了使用临床标志物的重要性,如使用症状,提高卫生专业人员对计算模型的信任。在本文中,我们引入了基于变压器的体系结构,旨在检测和解释社交媒体中用户生成内容中抑郁症状标记的出现。我们提出了两种方法:(I)训练模型进行分类,另一个用于分别解释分类器的决策,并且(ii)在单个模型中同时统一两个任务。此外,对于后一种方式,我们还利用上下文学习和微调研究了最近的会话大语言模型(LLM)的性能。我们的模型提供自然语言解释,符合验证的症状,从而使临床医生能够更有效地解释决策。我们使用最近以症状为中心的数据集评估我们的方法,使用离线指标和专家在环评估来评估我们的模型解释的质量。我们的发现表明,在产生可解释的基于症状的解释的同时,有可能获得良好的分类结果。
    Users of social platforms often perceive these sites as supportive spaces to post about their mental health issues. Those conversations contain important traces about individuals\' health risks. Recently, researchers have exploited this online information to construct mental health detection models, which aim to identify users at risk on platforms like Twitter, Reddit or Facebook. Most of these models are focused on achieving good classification results, ignoring the explainability and interpretability of the decisions. Recent research has pointed out the importance of using clinical markers, such as the use of symptoms, to improve trust in the computational models by health professionals. In this paper, we introduce transformer-based architectures designed to detect and explain the appearance of depressive symptom markers in user-generated content from social media. We present two approaches: (i) train a model to classify, and another one to explain the classifier\'s decision separately and (ii) unify the two tasks simultaneously within a single model. Additionally, for this latter manner, we also investigated the performance of recent conversational Large Language Models (LLMs) utilizing both in-context learning and finetuning. Our models provide natural language explanations, aligning with validated symptoms, thus enabling clinicians to interpret the decisions more effectively. We evaluate our approaches using recent symptom-focused datasets, using both offline metrics and expert-in-the-loop evaluations to assess the quality of our models\' explanations. Our findings demonstrate that it is possible to achieve good classification results while generating interpretable symptom-based explanations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    克罗恩病(CD)是一种不明原因的慢性炎症性肠病,其进展可引起严重的残疾和发病。由于CD的独特性质,许多患者在其一生中经常需要手术,术后并发症发生率高,会影响患者的预后。因此,识别和处理术后并发症至关重要.机器学习(ML)在医学领域变得越来越重要,基于ML的模型可用于预测CD肠切除术的术后并发症。最近,Wang等人发表了一篇有价值的文章,题为“预测克罗恩病肠切除术后的短期主要并发症:一项基于机器学习的研究”。我们欣赏作者的创造性工作,我们愿意分享我们的观点,并与作者讨论。
    Crohn\'s disease (CD) is a chronic inflammatory bowel disease of unknown origin that can cause significant disability and morbidity with its progression. Due to the unique nature of CD, surgery is often necessary for many patients during their lifetime, and the incidence of postoperative complications is high, which can affect the prognosis of patients. Therefore, it is essential to identify and manage postoperative complications. Machine learning (ML) has become increasingly important in the medical field, and ML-based models can be used to predict postoperative complications of intestinal resection for CD. Recently, a valuable article titled \"Predicting short-term major postoperative complications in intestinal resection for Crohn\'s disease: A machine learning-based study\" was published by Wang et al. We appreciate the authors\' creative work, and we are willing to share our views and discuss them with the authors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Editorial
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    自编码器是机器学习领域中的降维模型,可以将其视为主成分分析(PCA)的神经网络对应物。由于它们的灵活性和良好的性能,自编码器最近已用于估计金融中的非线性因素模型。自动编码器的主要弱点是结果比PCA获得的结果更不可解释。在本文中,我们建议在非线性因素模型的背景下采用Shapley值来提高自动编码器的可解释性。特别是,我们使用基于预测的Shapley值方法测量非线性潜在因素的相关性,该方法测量每个潜在因素在确定因素增强模型中的样本外准确性方面的贡献。考虑到商品市场有趣的经验实例,我们根据每种商品的样本外预测能力确定最相关的潜在因素。
    Autoencoders are dimension reduction models in the field of machine learning which can be thought of as a neural network counterpart of principal components analysis (PCA). Due to their flexibility and good performance, autoencoders have been recently used for estimating nonlinear factor models in finance. The main weakness of autoencoders is that the results are less explainable than those obtained with the PCA. In this paper, we propose the adoption of the Shapley value to improve the explainability of autoencoders in the context of nonlinear factor models. In particular, we measure the relevance of nonlinear latent factors using a forecast-based Shapley value approach that measures each latent factor\'s contributions in determining the out-of-sample accuracy in factor-augmented models. Considering the interesting empirical instance of the commodity market, we identify the most relevant latent factors for each commodity based on their out-of-sample forecasting ability.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    肌萎缩侧索硬化症(ALS)是一种进行性神经退行性疾病,严重影响受影响的人的言语和运动功能,然而早期发现和追踪疾病进展仍然具有挑战性.当前监测ALS进展的黄金标准,ALS功能评定量表-修订(ALSFRS-R),基于症状严重程度的主观评分,由于缺乏粒度,可能无法捕获细微但有临床意义的变化。可以远程自动从患者那里收集的多模态语音测量使我们能够弥合这一差距,因为它们具有连续的价值,因此,在捕捉疾病进展方面可能更有颗粒。在这里,我们研究了通过远程患者监测平台收集的ALS(pALS)患者的多模态语音测量的响应性和敏感性,以量化检测与疾病进展相关的临床意义变化所需的时间。我们记录了278名参与者的音频和视频,并自动提取了多模态语音生物标志物(声学,口面,语言)来自数据。我们发现,pALS语音相对于同一提示的规范启发的时间对齐以及用于描述图片的单词数量是检测到pALS在延髓(n=36)和非延髓发作(n=107)中的这种变化的最敏感措施。有趣的是,这些措施的反应是稳定的,即使在小样本量。我们进一步发现,即使没有患者报告的临床变化,某些语音测量也足以跟踪延髓下降。即,ALSFRS-R语音得分在总可能得分4中的3处保持不变。这项研究的结果有可能促进改进,加速和具有成本效益的临床试验和护理。
    Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease that severely impacts affected persons\' speech and motor functions, yet early detection and tracking of disease progression remain challenging. The current gold standard for monitoring ALS progression, the ALS functional rating scale - revised (ALSFRS-R), is based on subjective ratings of symptom severity, and may not capture subtle but clinically meaningful changes due to a lack of granularity. Multimodal speech measures which can be automatically collected from patients in a remote fashion allow us to bridge this gap because they are continuous-valued and therefore, potentially more granular at capturing disease progression. Here we investigate the responsiveness and sensitivity of multimodal speech measures in persons with ALS (pALS) collected via a remote patient monitoring platform in an effort to quantify how long it takes to detect a clinically-meaningful change associated with disease progression. We recorded audio and video from 278 participants and automatically extracted multimodal speech biomarkers (acoustic, orofacial, linguistic) from the data. We find that the timing alignment of pALS speech relative to a canonical elicitation of the same prompt and the number of words used to describe a picture are the most responsive measures at detecting such change in both pALS with bulbar (n = 36) and non-bulbar onset (n = 107). Interestingly, the responsiveness of these measures is stable even at small sample sizes. We further found that certain speech measures are sensitive enough to track bulbar decline even when there is no patient-reported clinical change, i.e. the ALSFRS-R speech score remains unchanged at 3 out of a total possible score of 4. The findings of this study have the potential to facilitate improved, accelerated and cost-effective clinical trials and care.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在越来越多的工业和技术过程中,基于机器学习的系统正被赋予监督任务。虽然它们已经成功地应用在许多应用领域,他们经常不能概括观察到的数据的变化,可能会导致环境变化或传感器退化。这些变化,通常被称为概念漂移可以在使用的解决方案中触发故障,这些解决方案在许多情况下是安全关键的。因此,在构建可靠且稳健的机器学习驱动解决方案时,检测和分析概念漂移是至关重要的一步。在这项工作中,我们考虑无监督数据流的设置,这与不同的监控和异常检测场景高度相关。特别是,我们专注于本地化和解释概念漂移的任务,这对于使人类操作员采取适当的行动至关重要。接下来提供概念漂移本地化问题的精确数学定义,我们调查了关于这个主题的文献。通过对参数人工数据集进行标准化实验,我们提供了不同策略的直接比较。因此,我们可以系统地分析不同方案的性质,并为实际应用提出第一准则。最后,我们探索解释概念漂移的新兴主题。
    In an increasing number of industrial and technical processes, machine learning-based systems are being entrusted with supervision tasks. While they have been successfully utilized in many application areas, they frequently are not able to generalize to changes in the observed data, which environmental changes or degrading sensors might cause. These changes, commonly referred to as concept drift can trigger malfunctions in the used solutions which are safety-critical in many cases. Thus, detecting and analyzing concept drift is a crucial step when building reliable and robust machine learning-driven solutions. In this work, we consider the setting of unsupervised data streams which is highly relevant for different monitoring and anomaly detection scenarios. In particular, we focus on the tasks of localizing and explaining concept drift which are crucial to enable human operators to take appropriate action. Next to providing precise mathematical definitions of the problem of concept drift localization, we survey the body of literature on this topic. By performing standardized experiments on parametric artificial datasets we provide a direct comparison of different strategies. Thereby, we can systematically analyze the properties of different schemes and suggest first guidelines for practical applications. Finally, we explore the emerging topic of explaining concept drift.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    深度学习方法最近在从心电图(ECG)波形检测左心室收缩功能障碍(LVSD)方面取得了成功。尽管他们的准确度很高,它们很难在临床环境中广泛解释和应用。在这项研究中,我们着手确定基于标准ECG测量的更简单的模型是否能够以与深度学习模型相似的准确度检测LVSD.
    使用40994个匹配的12导联心电图和经胸超声心动图的观察数据集,我们训练了一系列复杂度越来越高的模型,以基于ECG波形和导出的测量值检测LVSD.训练数据是从斯坦福大学医学中心获得的。外部验证数据从哥伦比亚医学中心和英国生物库获得。斯坦福数据集包括40994个匹配的心电图和超声心动图,其中9.72%有LVSD。使用555离散的随机森林模型,自动测量获得了0.92(0.91-0.93)的接收器操作特性曲线下的面积(AUC),类似于AUC为0.94(0.93-0.94)的深度学习波形模型。基于五个测量的逻辑回归模型实现了高性能[AUC为0.86(0.85-0.87)],接近深度学习模型,优于N末端激素原脑钠肽(NT-proBNP)。最后,我们发现更简单的模型更易于跨站点移植,有两个独立的实验,外部网站。
    我们的研究证明了简单心电图模型的价值,这些模型的性能几乎与深度学习模型相同。同时更容易实现和解释。
    UNASSIGNED: Deep learning methods have recently gained success in detecting left ventricular systolic dysfunction (LVSD) from electrocardiogram (ECG) waveforms. Despite their high level of accuracy, they are difficult to interpret and deploy broadly in the clinical setting. In this study, we set out to determine whether simpler models based on standard ECG measurements could detect LVSD with similar accuracy to that of deep learning models.
    UNASSIGNED: Using an observational data set of 40 994 matched 12-lead ECGs and transthoracic echocardiograms, we trained a range of models with increasing complexity to detect LVSD based on ECG waveforms and derived measurements. The training data were acquired from the Stanford University Medical Center. External validation data were acquired from the Columbia Medical Center and the UK Biobank. The Stanford data set consisted of 40 994 matched ECGs and echocardiograms, of which 9.72% had LVSD. A random forest model using 555 discrete, automated measurements achieved an area under the receiver operator characteristic curve (AUC) of 0.92 (0.91-0.93), similar to a deep learning waveform model with an AUC of 0.94 (0.93-0.94). A logistic regression model based on five measurements achieved high performance [AUC of 0.86 (0.85-0.87)], close to a deep learning model and better than N-terminal prohormone brain natriuretic peptide (NT-proBNP). Finally, we found that simpler models were more portable across sites, with experiments at two independent, external sites.
    UNASSIGNED: Our study demonstrates the value of simple electrocardiographic models that perform nearly as well as deep learning models, while being much easier to implement and interpret.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Iconography通过考虑艺术品中描绘的主题及其表现来研究艺术品的视觉内容。计算机视觉已用于识别绘画中的图像主题,卷积神经网络使基督教艺术绘画中的人物能够有效分类。然而,如果CNN获得的分类结果依赖于人类专家在研究图像时利用的相同图像特性,并且如果可以利用在整个艺术品图像上训练的分类器的体系结构来支持更艰巨的目标检测任务,则仍然需要证明。一种通过神经模型揭示分类过程的合适方法依赖于类激活图,强调图像对分类贡献最大的区域。这项工作比较了最先进的算法(CAM,Grad-CAM,Grad-CAM++,和SmoothGrad-CAM++)在识别图像属性的能力方面,这些属性决定了基督教艺术绘画中人物的分类。定量和定性分析表明,Grad-CAM,Grad-CAM++,和平滑Grad-CAM++具有相似的性能,而CAM具有较低的功效。平滑的Grad-CAM++隔离了多个断开的图像区域,可以很好地识别小的图标符号。Grad-CAM产生更宽,更连续的区域,更好地覆盖大型图标符号。CAM算法计算的显著图像区域已用于估计对象级边界框,定量分析表明,用Grad-CAM估计的框平均IoU达到55%,61%的GT已知本地化和31%的mAP。所获得的结果是朝着计算机辅助研究肖像元素定位和艺术品中相互关系的变化迈出的一步,并为自动创建边界框以训练基督教艺术图像中的肖像符号检测器开辟了道路。
    Iconography studies the visual content of artworks by considering the themes portrayed in them and their representation. Computer Vision has been used to identify iconographic subjects in paintings and Convolutional Neural Networks enabled the effective classification of characters in Christian art paintings. However, it still has to be demonstrated if the classification results obtained by CNNs rely on the same iconographic properties that human experts exploit when studying iconography and if the architecture of a classifier trained on whole artwork images can be exploited to support the much harder task of object detection. A suitable approach for exposing the process of classification by neural models relies on Class Activation Maps, which emphasize the areas of an image contributing the most to the classification. This work compares state-of-the-art algorithms (CAM, Grad-CAM, Grad-CAM++, and Smooth Grad-CAM++) in terms of their capacity of identifying the iconographic attributes that determine the classification of characters in Christian art paintings. Quantitative and qualitative analyses show that Grad-CAM, Grad-CAM++, and Smooth Grad-CAM++ have similar performances while CAM has lower efficacy. Smooth Grad-CAM++ isolates multiple disconnected image regions that identify small iconographic symbols well. Grad-CAM produces wider and more contiguous areas that cover large iconographic symbols better. The salient image areas computed by the CAM algorithms have been used to estimate object-level bounding boxes and a quantitative analysis shows that the boxes estimated with Grad-CAM reach 55% average IoU, 61% GT-known localization and 31% mAP. The obtained results are a step towards the computer-aided study of the variations of iconographic elements positioning and mutual relations in artworks and open the way to the automatic creation of bounding boxes for training detectors of iconographic symbols in Christian art images.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    -业务对算法的依赖正变得无处不在,公司越来越担心他们的算法会造成重大的财务或声誉损害。备受瞩目的案例包括谷歌的人工智能算法在2015年错误地将一对黑人夫妇标记为大猩猩(Gebru2020在牛津人工智能伦理手册中,pp.251-269),微软的AI聊天机器人Tay传播种族主义,推特上的性别歧视和反犹太言论(现为X)(Wolf等人。2017ACMSigcasComput。Soc.47,54-64(doi:10.1145/3144592.3144598)),亚马逊的人工智能招聘工具在显示出对女性的偏见后被废弃。作为回应,政府正在立法和实施禁令,监管机构对公司和司法机构进行罚款,讨论可能使算法在法律上人为“人”。与财务审计一样,政府,商业和社会将需要算法审计;正式保证算法是合法的,道德和安全。设想了一个新的行业:算法的审计和保证(参见数据隐私),具有使AI专业化和产业化的职权范围,ML和相关算法。利益相关者从从事政策/法规工作的人到行业从业者和开发人员。Wealsoexpectedthenatureandscopeoftheauditlevelsandframeworkpresentedwillinformthoseinterestedinsystemsofgovernanceandcompliancewithregulations/standards.我们在本文中的目标是调查执行审计和保证所必需的关键领域,并在这个新颖的研究和实践领域引发辩论。
    -Business reliance on algorithms is becoming ubiquitous, and companies are increasingly concerned about their algorithms causing major financial or reputational damage. High-profile cases include Google\'s AI algorithm for photo classification mistakenly labelling a black couple as gorillas in 2015 (Gebru 2020 In The Oxford handbook of ethics of AI, pp. 251-269), Microsoft\'s AI chatbot Tay that spread racist, sexist and antisemitic speech on Twitter (now X) (Wolf et al. 2017 ACM Sigcas Comput. Soc. 47, 54-64 (doi:10.1145/3144592.3144598)), and Amazon\'s AI recruiting tool being scrapped after showing bias against women. In response, governments are legislating and imposing bans, regulators fining companies and the judiciary discussing potentially making algorithms artificial \'persons\' in law. As with financial audits, governments, business and society will require algorithm audits; formal assurance that algorithms are legal, ethical and safe. A new industry is envisaged: Auditing and Assurance of Algorithms (cf. data privacy), with the remit to professionalize and industrialize AI, ML and associated algorithms. The stakeholders range from those working on policy/regulation to industry practitioners and developers. We also anticipate the nature and scope of the auditing levels and framework presented will inform those interested in systems of governance and compliance with regulation/standards. Our goal in this article is to survey the key areas necessary to perform auditing and assurance and instigate the debate in this novel area of research and practice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号