Multimodal deep learning

多模态深度学习
  • 文章类型: Journal Article
    背景:临床笔记包含与患者过去和当前健康状况相关的结构化数据之外的上下文信息。
    目的:本研究旨在设计一种多模态深度学习方法,以提高使用入院临床记录和易于收集的表格数据对心力衰竭(HF)的医院结局的评估精度。
    方法:多模态模型的开发和验证数据来自3个开放获取的美国数据库,包括2001年至2019年从教学医院收集的重症监护医学信息集市IIIv1.4(MIMIC-III)和MIMIC-IVv1.0,以及2014年至2015年从208家医院收集的eICU协作研究数据库v1.2。研究队列由所有患有严重HF的患者组成。临床笔记,包括主诉,目前的病史,体检,病史,和入院药物,以及记录在电子健康记录中的临床变量,进行了分析。我们开发了一种针对住院患者的深度学习死亡率预测模型,经历了完整的内部,prospective,和外部评估。采用综合梯度法和SHapley加法扩张法(SHAP)分析危险因素的重要性。
    结果:该研究包括发展集中的9989名(16.4%)患者,内部验证集中的2497(14.1%)名患者,预期验证集中为1896(18.3%),和外部验证组中的7432名(15%)患者。模型的受试者工作特征曲线下面积为0.838(95%CI0.827-0.851),0.849(95%CI0.841-0.856),和0.767(95%CI0.762-0.772),对于内部,prospective,和外部验证集,分别。在所有测试集中,多峰模型的接收器工作特性曲线下的面积优于单峰模型,和表格数据导致了更高的歧视。在早期评估中,病史和体格检查比其他因素更有用。
    结论:合并入院记录和临床表格数据的多模式深度学习模型显示,作为评估HF患者死亡风险的潜在新方法,具有良好的疗效。提供更准确、更及时的决策支持。
    BACKGROUND: Clinical notes contain contextualized information beyond structured data related to patients\' past and current health status.
    OBJECTIVE: This study aimed to design a multimodal deep learning approach to improve the evaluation precision of hospital outcomes for heart failure (HF) using admission clinical notes and easily collected tabular data.
    METHODS: Data for the development and validation of the multimodal model were retrospectively derived from 3 open-access US databases, including the Medical Information Mart for Intensive Care III v1.4 (MIMIC-III) and MIMIC-IV v1.0, collected from a teaching hospital from 2001 to 2019, and the eICU Collaborative Research Database v1.2, collected from 208 hospitals from 2014 to 2015. The study cohorts consisted of all patients with critical HF. The clinical notes, including chief complaint, history of present illness, physical examination, medical history, and admission medication, as well as clinical variables recorded in electronic health records, were analyzed. We developed a deep learning mortality prediction model for in-hospital patients, which underwent complete internal, prospective, and external evaluation. The Integrated Gradients and SHapley Additive exPlanations (SHAP) methods were used to analyze the importance of risk factors.
    RESULTS: The study included 9989 (16.4%) patients in the development set, 2497 (14.1%) patients in the internal validation set, 1896 (18.3%) in the prospective validation set, and 7432 (15%) patients in the external validation set. The area under the receiver operating characteristic curve of the models was 0.838 (95% CI 0.827-0.851), 0.849 (95% CI 0.841-0.856), and 0.767 (95% CI 0.762-0.772), for the internal, prospective, and external validation sets, respectively. The area under the receiver operating characteristic curve of the multimodal model outperformed that of the unimodal models in all test sets, and tabular data contributed to higher discrimination. The medical history and physical examination were more useful than other factors in early assessments.
    CONCLUSIONS: The multimodal deep learning model for combining admission notes and clinical tabular data showed promising efficacy as a potentially novel method in evaluating the risk of mortality in patients with HF, providing more accurate and timely decision support.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    隐蔽烟草广告通常会提高监管措施。本文提出了人工智能,特别是深度学习,具有检测隐藏广告的巨大潜力,并允许不偏不倚,可重复,以及与烟草相关的媒体内容的公平量化。我们提出了一种基于深度学习的集成文本和图像处理模型,生成方法,和人类强化,可以检测文本和视觉格式的吸烟案例,即使几乎没有可用的训练数据。我们的模型可以达到74%的图像和98%的文本精度。此外,我们的系统以人为强化的形式整合了专家干预的可能性。使用预先训练的多模态,image,通过深度学习提供的文本处理模型,即使使用很少的训练数据,也可以在不同的媒体中检测吸烟。
    Covert tobacco advertisements often raise regulatory measures. This paper presents that artificial intelligence, particularly deep learning, has great potential for detecting hidden advertising and allows unbiased, reproducible, and fair quantification of tobacco-related media content. We propose an integrated text and image processing model based on deep learning, generative methods, and human reinforcement, which can detect smoking cases in both textual and visual formats, even with little available training data. Our model can achieve 74% accuracy for images and 98% for text. Furthermore, our system integrates the possibility of expert intervention in the form of human reinforcement. Using the pre-trained multimodal, image, and text processing models available through deep learning makes it possible to detect smoking in different media even with few training data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    电子健康记录(EHR)系统的广泛采用为我们带来了大量的临床数据,从而为进行基于数据的医疗保健研究提供了机会,以解决医疗领域的各种临床问题。机器学习和深度学习方法由于其从原始数据中挖掘见解的能力而被广泛用于医学信息学和医疗保健领域。在为EHR数据调整深度学习模型时,必须考虑其异质性:EHR包含来自各种来源的患者记录,包括医学检查(例如血液检查,微生物学测试),医学成像,诊断,药物,程序,临床笔记,等。这些模式共同提供了患者健康状况的整体视图,并相互补充。因此,将来自本质上不同的多种模式的数据组合在一起具有挑战性,但在EHR的深度学习中具有直观的前景。为了评估多峰数据的预期,我们引入了一个旨在整合时间变量的综合融合框架,医学图像,和EHR中的临床注释,以增强临床风险预测的性能。早期,接头,并采用后期融合策略有效地组合来自各种模态的数据。我们用三个预测任务来测试模型:住院死亡率,长时间的逗留,30天的重新接纳。实验结果表明,在所涉及的任务中,多模态模型优于单模态模型。此外,通过训练具有不同输入模态组合的模型,我们计算每种模态的Shapley值,以量化它们对多模态性能的贡献。结果表明,在三个探索的预测任务中,时间变量往往比CXR图像和临床注释更有帮助。
    The broad adoption of electronic health record (EHR) systems brings us a tremendous amount of clinical data and thus provides opportunities to conduct data-based healthcare research to solve various clinical problems in the medical domain. Machine learning and deep learning methods are widely used in the medical informatics and healthcare domain due to their power to mine insights from raw data. When adapting deep learning models for EHR data, it is essential to consider its heterogeneous nature: EHR contains patient records from various sources including medical tests (e.g. blood test, microbiology test), medical imaging, diagnosis, medications, procedures, clinical notes, etc. Those modalities together provide a holistic view of patient health status and complement each other. Therefore, combining data from multiple modalities that are intrinsically different is challenging but intuitively promising in deep learning for EHR. To assess the expectations of multimodal data, we introduce a comprehensive fusion framework designed to integrate temporal variables, medical images, and clinical notes in EHR for enhanced performance in clinical risk prediction. Early, joint, and late fusion strategies are employed to combine data from various modalities effectively. We test the model with three predictive tasks: in-hospital mortality, long length of stay, and 30-day readmission. Experimental results show that multimodal models outperform uni-modal models in the tasks involved. Additionally, by training models with different input modality combinations, we calculate the Shapley value for each modality to quantify their contribution to multimodal performance. It is shown that temporal variables tend to be more helpful than CXR images and clinical notes in the three explored predictive tasks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:药物-靶标相互作用(DTI)预测使用药物分子和蛋白质序列作为输入来预测结合亲和力值。近年来,基于深度学习的模型得到了更多的关注。这些方法有两个模块:特征提取模块和任务预测模块。在大多数基于深度学习的方法中,简单的任务预测损失(即,分类任务的分类交叉熵和回归任务的均方误差)用于学习模型。在机器学习中,开发了基于对比的损失函数来学习更多的判别特征空间。在基于深度学习的模型中,提取更具辨别力的特征空间可以提高任务预测模块的性能。
    结果:在本文中,我们使用多模态知识作为输入,并提出了一种基于注意力的融合技术来结合这些知识。此外,我们研究了如何在任务预测损失中利用对比损失函数可以帮助该方法学习更强大的模型。考虑了四个对比损失函数:(1)最大边际对比损失函数,(2)三元组损失函数,(3)多类N对损失目标,和(4)NT-Xent损失函数。所提出的模型使用四个著名的数据集进行评估:Wang等人。数据集,罗的数据集,戴维斯,和KIBA数据集。
    结论:因此,在回顾了最先进的方法之后,我们通过结合蛋白质序列和药物分子开发了多模态特征提取网络,以及蛋白质-蛋白质相互作用网络和药物-药物相互作用网络。结果表明,它的性能明显优于可比的最新方法。
    BACKGROUND: The Drug-Target Interaction (DTI) prediction uses a drug molecule and a protein sequence as inputs to predict the binding affinity value. In recent years, deep learning-based models have gotten more attention. These methods have two modules: the feature extraction module and the task prediction module. In most deep learning-based approaches, a simple task prediction loss (i.e., categorical cross entropy for the classification task and mean squared error for the regression task) is used to learn the model. In machine learning, contrastive-based loss functions are developed to learn more discriminative feature space. In a deep learning-based model, extracting more discriminative feature space leads to performance improvement for the task prediction module.
    RESULTS: In this paper, we have used multimodal knowledge as input and proposed an attention-based fusion technique to combine this knowledge. Also, we investigate how utilizing contrastive loss function along the task prediction loss could help the approach to learn a more powerful model. Four contrastive loss functions are considered: (1) max-margin contrastive loss function, (2) triplet loss function, (3) Multi-class N-pair Loss Objective, and (4) NT-Xent loss function. The proposed model is evaluated using four well-known datasets: Wang et al. dataset, Luo\'s dataset, Davis, and KIBA datasets.
    CONCLUSIONS: Accordingly, after reviewing the state-of-the-art methods, we developed a multimodal feature extraction network by combining protein sequences and drug molecules, along with protein-protein interaction networks and drug-drug interaction networks. The results show it performs significantly better than the comparable state-of-the-art approaches.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    多发性硬化症(MS)是一种在人脑和脊髓中发展的慢性疾病,这会导致神经的永久性损伤或恶化。MS疾病的严重程度由扩展残疾状态量表监测,由几个功能子分数组成。MS疾病严重程度的早期和准确分类对于通过应用早期治疗干预策略减缓或预防疾病进展至关重要。深度学习的最新进展和电子健康记录(EHR)的广泛使用为应用数据驱动和预测建模工具实现这一目标创造了机会。由于数据不足或模型简单,以前专注于使用单模态机器学习和深度学习算法的研究在预测准确性方面受到限制。在本文中,我们提出了使用患者多模态纵向和纵向EHR数据预测未来多发性硬化疾病严重程度的想法.我们的贡献有两个主要方面。首先,我们描述了整合结构化EHR数据的开创性努力,神经影像学数据和临床笔记,以构建多模式深度学习框架来预测患者的MS严重程度。与使用单模态数据的模型相比,拟议的管道在接收器工作特性曲线(AUROC)下的面积方面增加了19%。第二,该研究还提供了有关MS疾病预测的每种数据模态中嵌入的有用信号量的宝贵见解,这可能会改善数据收集过程。
    Multiple Sclerosis (MS) is a chronic disease developed in the human brain and spinal cord, which can cause permanent damage or deterioration of the nerves. The severity of MS disease is monitored by the Expanded Disability Status Scale, composed of several functional sub-scores. Early and accurate classification of MS disease severity is critical for slowing down or preventing disease progression via applying early therapeutic intervention strategies. Recent advances in deep learning and the wide use of Electronic Health Records (EHR) create opportunities to apply data-driven and predictive modeling tools for this goal. Previous studies focusing on using single-modal machine learning and deep learning algorithms were limited in terms of prediction accuracy due to data insufficiency or model simplicity. In this paper, we proposed the idea of using patients\' multimodal longitudinal and longitudinal EHR data to predict multiple sclerosis disease severity in the future. Our contribution has two main facets. First, we describe a pioneering effort to integrate structured EHR data, neuroimaging data and clinical notes to build a multi-modal deep learning framework to predict patient\'s MS severity. The proposed pipeline demonstrates up to 19% increase in terms of the area under the Area Under the Receiver Operating Characteristic curve (AUROC) compared to models using single-modal data. Second, the study also provides valuable insights regarding the amount useful signal embedded in each data modality with respect to MS disease prediction, which may improve data collection processes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着无线环境下多媒体系统的发展,对人工智能的日益增长的需求是设计一个能够以类似人类的方式全面理解各种信息的正确与人类交流的系统。因此,本文介绍了一种视听场景感知对话系统,该系统可以与用户进行有关视听场景的交流。不仅要全面理解视觉和文本信息,还要理解音频信息。尽管在语言和视觉模式的多模态表示学习方面取得了重大进展,仍然有两个警告:听觉信息的无效使用和深度学习系统推理缺乏可解释性。为了解决这些问题,我们提出了一种新颖的视听场景感知对话系统,该系统利用来自每种模态的一组显式信息作为自然语言的形式,可以自然地融合到语言模型中。它利用基于变压器的解码器在多任务学习设置中基于多模式知识生成连贯且正确的响应。此外,我们还讨论了用响应驱动的时间矩定位方法解释模型的方法,以验证系统如何生成响应。系统本身向用户提供在系统响应过程中提到的证据,作为场景的时间戳的形式。与基线相比,我们显示了所提出的模型在所有定量和定性测量中的优越性。特别是,即使在使用所有三种模态的环境中,所提出的模型也实现了稳健的性能,包括音频。我们还进行了大量的实验来研究所提出的模型。此外,我们在系统响应推理任务中获得了最先进的性能。
    With the development of multimedia systems in wireless environments, the rising need for artificial intelligence is to design a system that can properly communicate with humans with a comprehensive understanding of various types of information in a human-like manner. Therefore, this paper addresses an audio-visual scene-aware dialog system that can communicate with users about audio-visual scenes. It is essential to understand not only visual and textual information but also audio information in a comprehensive way. Despite the substantial progress in multimodal representation learning with language and visual modalities, there are still two caveats: ineffective use of auditory information and the lack of interpretability of the deep learning systems\' reasoning. To address these issues, we propose a novel audio-visual scene-aware dialog system that utilizes a set of explicit information from each modality as a form of natural language, which can be fused into a language model in a natural way. It leverages a transformer-based decoder to generate a coherent and correct response based on multimodal knowledge in a multitask learning setting. In addition, we also address the way of interpreting the model with a response-driven temporal moment localization method to verify how the system generates the response. The system itself provides the user with the evidence referred to in the system response process as a form of the timestamp of the scene. We show the superiority of the proposed model in all quantitative and qualitative measurements compared to the baseline. In particular, the proposed model achieved robust performance even in environments using all three modalities, including audio. We also conducted extensive experiments to investigate the proposed model. In addition, we obtained state-of-the-art performance in the system response reasoning task.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    简介:现有的大规模临床前癌症药物反应数据库为我们提供了一个很好的机会来识别和预测潜在有效的抗癌药物。建立在这些数据库上的深度学习模型已经被开发并应用于解决癌症药物反应预测任务。他们的预测已经被证明显著优于传统的机器学习方法。然而,由于“黑匣子”的特点,生物学上的忠实解释很难从这些深度学习模型中得出。已经提出了依赖于可见神经网络(VNN)的可解释深度学习模型,以为预测结果提供生物学依据。然而,它们的性能不符合在临床实践中应用的预期。方法:本文,我们开发了一个XMR模型,用于药物反应预测的可解释多模态神经网络。XMR是一种新的紧凑型多模式神经网络,由两个子网络组成:用于学习基因组特征的可见神经网络和用于学习药物结构特征的图神经网络(GNN)。两个子网络被整合到多模态融合层中,以对给定基因突变和药物的分子结构的药物反应进行建模。此外,采用修剪方法来更好地解释XMR模型。我们使用五种途径层次结构(细胞周期,DNA修复,疾病,信号转导,和新陈代谢),从反应组途径数据库获得,作为我们的XMR模型预测三阴性乳腺癌药物反应的VNN架构。结果:我们发现我们的模型在预测性能方面优于其他最先进的可解释深度学习模型。此外,我们的模型可以为解释三阴性乳腺癌的药物反应提供生物学见解.讨论:总的来说,在多模态融合层中结合VNN和GNN,XMR捕获关键的基因组和分子特征,并在生物学中提供合理的可解释性。从而更好地预测癌症患者的药物反应。我们的模型也将有利于未来的个性化癌症治疗。
    Introduction: Existing large-scale preclinical cancer drug response databases provide us with a great opportunity to identify and predict potentially effective drugs to combat cancers. Deep learning models built on these databases have been developed and applied to tackle the cancer drug-response prediction task. Their prediction has been demonstrated to significantly outperform traditional machine learning methods. However, due to the \"black box\" characteristic, biologically faithful explanations are hardly derived from these deep learning models. Interpretable deep learning models that rely on visible neural networks (VNNs) have been proposed to provide biological justification for the predicted outcomes. However, their performance does not meet the expectation to be applied in clinical practice. Methods: In this paper, we develop an XMR model, an eXplainable Multimodal neural network for drug Response prediction. XMR is a new compact multimodal neural network consisting of two sub-networks: a visible neural network for learning genomic features and a graph neural network (GNN) for learning drugs\' structural features. Both sub-networks are integrated into a multimodal fusion layer to model the drug response for the given gene mutations and the drug\'s molecular structures. Furthermore, a pruning approach is applied to provide better interpretations of the XMR model. We use five pathway hierarchies (cell cycle, DNA repair, diseases, signal transduction, and metabolism), which are obtained from the Reactome Pathway Database, as the architecture of VNN for our XMR model to predict drug responses of triple negative breast cancer. Results: We find that our model outperforms other state-of-the-art interpretable deep learning models in terms of predictive performance. In addition, our model can provide biological insights into explaining drug responses for triple-negative breast cancer. Discussion: Overall, combining both VNN and GNN in a multimodal fusion layer, XMR captures key genomic and molecular features and offers reasonable interpretability in biology, thereby better predicting drug responses in cancer patients. Our model would also benefit personalized cancer therapy in the future.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    提出了一种多模态深度学习(MDL)框架,用于通过合并物理属性和化学数据来预测十维丙烯酸聚合物复合材料的物理性质。MDL模型包括四个模块,包括三个用于材料结构表征的生成深度学习模型和第四个用于属性预测的模型。该方法处理18维的复杂性,具有十个组成输入和八个属性输出,成功预测了114210个组成条件下的913680个属性数据点。这种复杂性在计算材料科学中是前所未有的,特别是对于具有未定义结构的材料。提出了一个框架来分析用于逆向材料设计的高维信息空间,展示对各种材料和尺度的灵活性和适应性,只要有足够的数据。这项研究推进了未来对不同材料的研究,并开发了更复杂的模型,使作者更接近预测所有材料的所有性能的最终目标。
    A multimodal deep-learning (MDL) framework is presented for predicting physical properties of a ten-dimensional acrylic polymer composite material by merging physical attributes and chemical data. The MDL model comprises four modules, including three generative deep-learning models for material structure characterization and a fourth model for property prediction. The approach handles an 18-dimensional complexity, with ten compositional inputs and eight property outputs, successfully predicting 913 680 property data points across 114 210 composition conditions. This level of complexity is unprecedented in computational materials science, particularly for materials with undefined structures. A framework is proposed to analyze the high-dimensional information space for inverse material design, demonstrating flexibility and adaptability to various materials and scales, provided sufficient data are available. This study advances future research on different materials and the development of more sophisticated models, drawing the authors closer to the ultimate goal of predicting all properties of all materials.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    水在番茄(SolanumlycopersicumL.)的生长中起着非常重要的作用,而如何检测番茄的水分状况是精准灌溉的关键。本研究的目的是通过融合RGB来检测番茄的水分状况,通过深度学习的NIR和深度图像信息。设定了五个灌溉水平,以在不同的水状态下种植西红柿,灌溉量为150%,125%,100%,75%,通过修正的Penman-Monteith方程计算的参考蒸散量的50%,分别。西红柿的水分状况分为五类:严重灌溉赤字,略有灌溉赤字,适度灌溉,稍微过度灌溉,严重过度灌溉。RGB图像,取番茄植株上部的深度图像和近红外图像作为数据集。数据集用于训练和测试使用单模式和多模式深度学习网络构建的番茄水分状态检测模型,分别。在单模式深度学习网络中,两个CNN,VGG-16和Resnet-50在单个RGB图像上进行了训练,深度图像,或NIR图像共6例。在多模式深度学习网络中,两个或更多的RGB图像,深度图像和近红外图像分别用VGG-16或Resnet-50训练,共20种组合。结果表明,基于单模式深度学习的番茄水分状态检测准确率为88.97%~93.09%,而基于多模态深度学习的番茄水分状态检测准确率为93.09%~99.18%。多模态深度学习显著优于单模态深度学习。使用多模式深度学习网络建立的番茄水分状态检测模型具有RGB图像的ResNet-50和深度和NIR图像的VGG-16。该研究为番茄水分状态的无损检测提供了一种新方法,为精准灌溉管理提供了参考。
    Water plays a very important role in the growth of tomato (Solanum lycopersicum L.), and how to detect the water status of tomato is the key to precise irrigation. The objective of this study is to detect the water status of tomato by fusing RGB, NIR and depth image information through deep learning. Five irrigation levels were set to cultivate tomatoes in different water states, with irrigation amounts of 150%, 125%, 100%, 75%, and 50% of reference evapotranspiration calculated by a modified Penman-Monteith equation, respectively. The water status of tomatoes was divided into five categories: severely irrigated deficit, slightly irrigated deficit, moderately irrigated, slightly over-irrigated, and severely over-irrigated. RGB images, depth images and NIR images of the upper part of the tomato plant were taken as data sets. The data sets were used to train and test the tomato water status detection models built with single-mode and multimodal deep learning networks, respectively. In the single-mode deep learning network, two CNNs, VGG-16 and Resnet-50, were trained on a single RGB image, a depth image, or a NIR image for a total of six cases. In the multimodal deep learning network, two or more of the RGB images, depth images and NIR images were trained with VGG-16 or Resnet-50, respectively, for a total of 20 combinations. Results showed that the accuracy of tomato water status detection based on single-mode deep learning ranged from 88.97% to 93.09%, while the accuracy of tomato water status detection based on multimodal deep learning ranged from 93.09% to 99.18%. The multimodal deep learning significantly outperformed the single-modal deep learning. The tomato water status detection model built using a multimodal deep learning network with ResNet-50 for RGB images and VGG-16 for depth and NIR images was optimal. This study provides a novel method for non-destructive detection of water status of tomato and gives a reference for precise irrigation management.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Preprint
    电子健康记录(EHR)的广泛采用为开展医疗保健研究和解决医学中的各种临床问题提供了很好的机会。随着最近的进步和成功,基于机器学习和深度学习的方法在医学信息学中越来越流行。组合来自多个模态的数据可以有助于预测任务。为了评估多峰数据的预期,我们引入了一个旨在整合时间变量的综合融合框架,医学图像,以及电子健康记录(EHR)中的临床注释,以增强下游预测任务的性能。早期,接头,和后期融合策略被用来有效地组合来自各种模式的数据。模型性能和贡献分数表明,在各种任务中,多模态模型的性能优于单模态模型。此外,在三个探索的预测任务中,时间征象比CXR图像和临床记录包含更多的信息。因此,集成不同数据模式的模型可以在预测任务中更好地工作。
    The broad adoption of electronic health records (EHRs) provides great opportunities to conduct healthcare research and solve various clinical problems in medicine. With recent advances and success, methods based on machine learning and deep learning have become increasingly popular in medical informatics. Combining data from multiple modalities may help in predictive tasks. To assess the expectations of multimodal data, we introduce a comprehensive fusion framework designed to integrate temporal variables, medical images, and clinical notes in Electronic Health Record (EHR) for enhanced performance in downstream predictive tasks. Early, joint, and late fusion strategies were employed to effectively combine data from various modalities. Model performance and contribution scores show that multimodal models outperform uni-modal models in various tasks. Additionally, temporal signs contain more information than CXR images and clinical notes in three explored predictive tasks. Therefore, models integrating different data modalities can work better in predictive tasks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号