Multimodal deep learning

多模态深度学习
  • 文章类型: Journal Article
    背景:临床笔记包含与患者过去和当前健康状况相关的结构化数据之外的上下文信息。
    目的:本研究旨在设计一种多模态深度学习方法,以提高使用入院临床记录和易于收集的表格数据对心力衰竭(HF)的医院结局的评估精度。
    方法:多模态模型的开发和验证数据来自3个开放获取的美国数据库,包括2001年至2019年从教学医院收集的重症监护医学信息集市IIIv1.4(MIMIC-III)和MIMIC-IVv1.0,以及2014年至2015年从208家医院收集的eICU协作研究数据库v1.2。研究队列由所有患有严重HF的患者组成。临床笔记,包括主诉,目前的病史,体检,病史,和入院药物,以及记录在电子健康记录中的临床变量,进行了分析。我们开发了一种针对住院患者的深度学习死亡率预测模型,经历了完整的内部,prospective,和外部评估。采用综合梯度法和SHapley加法扩张法(SHAP)分析危险因素的重要性。
    结果:该研究包括发展集中的9989名(16.4%)患者,内部验证集中的2497(14.1%)名患者,预期验证集中为1896(18.3%),和外部验证组中的7432名(15%)患者。模型的受试者工作特征曲线下面积为0.838(95%CI0.827-0.851),0.849(95%CI0.841-0.856),和0.767(95%CI0.762-0.772),对于内部,prospective,和外部验证集,分别。在所有测试集中,多峰模型的接收器工作特性曲线下的面积优于单峰模型,和表格数据导致了更高的歧视。在早期评估中,病史和体格检查比其他因素更有用。
    结论:合并入院记录和临床表格数据的多模式深度学习模型显示,作为评估HF患者死亡风险的潜在新方法,具有良好的疗效。提供更准确、更及时的决策支持。
    BACKGROUND: Clinical notes contain contextualized information beyond structured data related to patients\' past and current health status.
    OBJECTIVE: This study aimed to design a multimodal deep learning approach to improve the evaluation precision of hospital outcomes for heart failure (HF) using admission clinical notes and easily collected tabular data.
    METHODS: Data for the development and validation of the multimodal model were retrospectively derived from 3 open-access US databases, including the Medical Information Mart for Intensive Care III v1.4 (MIMIC-III) and MIMIC-IV v1.0, collected from a teaching hospital from 2001 to 2019, and the eICU Collaborative Research Database v1.2, collected from 208 hospitals from 2014 to 2015. The study cohorts consisted of all patients with critical HF. The clinical notes, including chief complaint, history of present illness, physical examination, medical history, and admission medication, as well as clinical variables recorded in electronic health records, were analyzed. We developed a deep learning mortality prediction model for in-hospital patients, which underwent complete internal, prospective, and external evaluation. The Integrated Gradients and SHapley Additive exPlanations (SHAP) methods were used to analyze the importance of risk factors.
    RESULTS: The study included 9989 (16.4%) patients in the development set, 2497 (14.1%) patients in the internal validation set, 1896 (18.3%) in the prospective validation set, and 7432 (15%) patients in the external validation set. The area under the receiver operating characteristic curve of the models was 0.838 (95% CI 0.827-0.851), 0.849 (95% CI 0.841-0.856), and 0.767 (95% CI 0.762-0.772), for the internal, prospective, and external validation sets, respectively. The area under the receiver operating characteristic curve of the multimodal model outperformed that of the unimodal models in all test sets, and tabular data contributed to higher discrimination. The medical history and physical examination were more useful than other factors in early assessments.
    CONCLUSIONS: The multimodal deep learning model for combining admission notes and clinical tabular data showed promising efficacy as a potentially novel method in evaluating the risk of mortality in patients with HF, providing more accurate and timely decision support.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    乳腺癌,一种高度强大和多样化的恶性肿瘤,主要影响全球女性,由于其复杂的遗传变异性,构成了重大威胁,使得准确诊断具有挑战性。各种疗法,如免疫疗法,放射治疗,根据癌症亚型和转移严重程度,广泛使用多种化疗方法,如药物再利用和联合治疗。我们的研究围绕针对RTK信号特异性潜在候选药物的创新药物发现策略,癌症中一个突出的靶向受体类别。要做到这一点,我们已经开发了一种基于多模态深度神经网络(MM-DNN)的QSAR模型,该模型集成了组学数据集以阐明基因组,蛋白质组表达数据,和药物反应,严格验证。结果显示R2值为0.917,RMSE值为0.312,证实了该模型值得称道的预测能力。RTK信号特异性药物分子的结构类似物来自PubChem数据库,其次是细致的筛选,以消除不同的化合物。利用基于MM-DNN的QSAR模型,我们预测了这些分子的生物活性,随后将它们分成三个不同的组。进行特征重要性分析。因此,我们成功确定了针对RTK信号通路中每个潜在下游调节蛋白定制的主要候选药物.这种方法通过去除非活性化合物,使药物开发的早期阶段更快,为抗击乳腺癌提供了一条充满希望的道路。
    Breast cancer, a highly formidable and diverse malignancy predominantly affecting women globally, poses a significant threat due to its intricate genetic variability, rendering it challenging to diagnose accurately. Various therapies such as immunotherapy, radiotherapy, and diverse chemotherapy approaches like drug repurposing and combination therapy are widely used depending on cancer subtype and metastasis severity. Our study revolves around an innovative drug discovery strategy targeting potential drug candidates specific to RTK signalling, a prominently targeted receptor class in cancer. To accomplish this, we have developed a multimodal deep neural network (MM-DNN) based QSAR model integrating omics datasets to elucidate genomic, proteomic expression data, and drug responses, validated rigorously. The results showcase an R2 value of 0.917 and an RMSE value of 0.312, affirming the model\'s commendable predictive capabilities. Structural analogs of drug molecules specific to RTK signalling were sourced from the PubChem database, followed by meticulous screening to eliminate dissimilar compounds. Leveraging the MM-DNN-based QSAR model, we predicted the biological activity of these molecules, subsequently clustering them into three distinct groups. Feature importance analysis was performed. Consequently, we successfully identified prime drug candidates tailored for each potential downstream regulatory protein within the RTK signalling pathway. This method makes the early stages of drug development faster by removing inactive compounds, providing a hopeful path in combating breast cancer.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    双相情感障碍(BD)的特征是反复发作的抑郁症和轻度躁狂症。在本文中,为了解决现有方法准确性不足的常见问题,满足临床诊断的要求,我们提出了一个称为时空特征融合变换器(STF2Former)的框架。通过引入时空特征聚合模块(STFAM)来学习rs-fMRI数据的时间和空间特征,它改进了我们以前的工作-MFFormer。它促进了跨不同模态的模态内注意力和信息融合。具体来说,该方法将时间维度和空间维度解耦,设计了两个特征提取模块,分别提取时间信息和空间信息。大量的实验证明了我们提出的STFAM在从rs-fMRI中提取特征的有效性,并证明我们的STF2Former可以显著优于MFFormer,并在其他最先进的方法中取得更好的结果。
    Bipolar disorder (BD) is characterized by recurrent episodes of depression and mild mania. In this paper, to address the common issue of insufficient accuracy in existing methods and meet the requirements of clinical diagnosis, we propose a framework called Spatio-temporal Feature Fusion Transformer (STF2Former). It improves on our previous work - MFFormer by introducing a Spatio-temporal Feature Aggregation Module (STFAM) to learn the temporal and spatial features of rs-fMRI data. It promotes intra-modality attention and information fusion across different modalities. Specifically, this method decouples the temporal and spatial dimensions and designs two feature extraction modules for extracting temporal and spatial information separately. Extensive experiments demonstrate the effectiveness of our proposed STFAM in extracting features from rs-fMRI, and prove that our STF2Former can significantly outperform MFFormer and achieve much better results among other state-of-the-art methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    隐蔽烟草广告通常会提高监管措施。本文提出了人工智能,特别是深度学习,具有检测隐藏广告的巨大潜力,并允许不偏不倚,可重复,以及与烟草相关的媒体内容的公平量化。我们提出了一种基于深度学习的集成文本和图像处理模型,生成方法,和人类强化,可以检测文本和视觉格式的吸烟案例,即使几乎没有可用的训练数据。我们的模型可以达到74%的图像和98%的文本精度。此外,我们的系统以人为强化的形式整合了专家干预的可能性。使用预先训练的多模态,image,通过深度学习提供的文本处理模型,即使使用很少的训练数据,也可以在不同的媒体中检测吸烟。
    Covert tobacco advertisements often raise regulatory measures. This paper presents that artificial intelligence, particularly deep learning, has great potential for detecting hidden advertising and allows unbiased, reproducible, and fair quantification of tobacco-related media content. We propose an integrated text and image processing model based on deep learning, generative methods, and human reinforcement, which can detect smoking cases in both textual and visual formats, even with little available training data. Our model can achieve 74% accuracy for images and 98% for text. Furthermore, our system integrates the possibility of expert intervention in the form of human reinforcement. Using the pre-trained multimodal, image, and text processing models available through deep learning makes it possible to detect smoking in different media even with few training data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    阿尔茨海默病是痴呆的最常见原因,他们的进展跨越不同的阶段,从非常轻度的认知障碍到轻度和严重的疾病。在临床试验中,磁共振成像(MRI)和正电子发射断层扫描(PET)主要用于神经退行性疾病的早期诊断,因为它们提供了大脑的体积和代谢功能信息。分别。近年来,深度学习(DL)已被用于医学成像,并取得了有希望的结果。此外,深度神经网络的使用,特别是卷积神经网络(CNN),还支持在需要利用来自多个数据源的信息的领域中开发基于DL的解决方案,提升多模态深度学习(MDL)。在本文中,我们利用MRI和PET扫描对用于痴呆严重程度评估的MDL方法进行了系统分析.我们提出了一种多输入多输出3DCNN,其训练迭代根据输入的特征而变化,因为它能够处理不完整的采集,其中错过了一种图像模态。在OASIS-3数据集上进行的实验表明,所实现的网络具有令人满意的结果,它优于利用单一图像模态和不同MDL融合技术的方法。
    Alzheimer\'s Disease is the most common cause of dementia, whose progression spans in different stages, from very mild cognitive impairment to mild and severe conditions. In clinical trials, Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) are mostly used for the early diagnosis of neurodegenerative disorders since they provide volumetric and metabolic function information of the brain, respectively. In recent years, Deep Learning (DL) has been employed in medical imaging with promising results. Moreover, the use of the deep neural networks, especially Convolutional Neural Networks (CNNs), has also enabled the development of DL-based solutions in domains characterized by the need of leveraging information coming from multiple data sources, raising the Multimodal Deep Learning (MDL). In this paper, we conduct a systematic analysis of MDL approaches for dementia severity assessment exploiting MRI and PET scans. We propose a Multi Input-Multi Output 3D CNN whose training iterations change according to the characteristic of the input as it is able to handle incomplete acquisitions, in which one image modality is missed. Experiments performed on OASIS-3 dataset show the satisfactory results of the implemented network, which outperforms approaches exploiting both single image modality and different MDL fusion techniques.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    电子健康记录(EHR)系统的广泛采用为我们带来了大量的临床数据,从而为进行基于数据的医疗保健研究提供了机会,以解决医疗领域的各种临床问题。机器学习和深度学习方法由于其从原始数据中挖掘见解的能力而被广泛用于医学信息学和医疗保健领域。在为EHR数据调整深度学习模型时,必须考虑其异质性:EHR包含来自各种来源的患者记录,包括医学检查(例如血液检查,微生物学测试),医学成像,诊断,药物,程序,临床笔记,等。这些模式共同提供了患者健康状况的整体视图,并相互补充。因此,将来自本质上不同的多种模式的数据组合在一起具有挑战性,但在EHR的深度学习中具有直观的前景。为了评估多峰数据的预期,我们引入了一个旨在整合时间变量的综合融合框架,医学图像,和EHR中的临床注释,以增强临床风险预测的性能。早期,接头,并采用后期融合策略有效地组合来自各种模态的数据。我们用三个预测任务来测试模型:住院死亡率,长时间的逗留,30天的重新接纳。实验结果表明,在所涉及的任务中,多模态模型优于单模态模型。此外,通过训练具有不同输入模态组合的模型,我们计算每种模态的Shapley值,以量化它们对多模态性能的贡献。结果表明,在三个探索的预测任务中,时间变量往往比CXR图像和临床注释更有帮助。
    The broad adoption of electronic health record (EHR) systems brings us a tremendous amount of clinical data and thus provides opportunities to conduct data-based healthcare research to solve various clinical problems in the medical domain. Machine learning and deep learning methods are widely used in the medical informatics and healthcare domain due to their power to mine insights from raw data. When adapting deep learning models for EHR data, it is essential to consider its heterogeneous nature: EHR contains patient records from various sources including medical tests (e.g. blood test, microbiology test), medical imaging, diagnosis, medications, procedures, clinical notes, etc. Those modalities together provide a holistic view of patient health status and complement each other. Therefore, combining data from multiple modalities that are intrinsically different is challenging but intuitively promising in deep learning for EHR. To assess the expectations of multimodal data, we introduce a comprehensive fusion framework designed to integrate temporal variables, medical images, and clinical notes in EHR for enhanced performance in clinical risk prediction. Early, joint, and late fusion strategies are employed to combine data from various modalities effectively. We test the model with three predictive tasks: in-hospital mortality, long length of stay, and 30-day readmission. Experimental results show that multimodal models outperform uni-modal models in the tasks involved. Additionally, by training models with different input modality combinations, we calculate the Shapley value for each modality to quantify their contribution to multimodal performance. It is shown that temporal variables tend to be more helpful than CXR images and clinical notes in the three explored predictive tasks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:药物-靶标相互作用(DTI)预测使用药物分子和蛋白质序列作为输入来预测结合亲和力值。近年来,基于深度学习的模型得到了更多的关注。这些方法有两个模块:特征提取模块和任务预测模块。在大多数基于深度学习的方法中,简单的任务预测损失(即,分类任务的分类交叉熵和回归任务的均方误差)用于学习模型。在机器学习中,开发了基于对比的损失函数来学习更多的判别特征空间。在基于深度学习的模型中,提取更具辨别力的特征空间可以提高任务预测模块的性能。
    结果:在本文中,我们使用多模态知识作为输入,并提出了一种基于注意力的融合技术来结合这些知识。此外,我们研究了如何在任务预测损失中利用对比损失函数可以帮助该方法学习更强大的模型。考虑了四个对比损失函数:(1)最大边际对比损失函数,(2)三元组损失函数,(3)多类N对损失目标,和(4)NT-Xent损失函数。所提出的模型使用四个著名的数据集进行评估:Wang等人。数据集,罗的数据集,戴维斯,和KIBA数据集。
    结论:因此,在回顾了最先进的方法之后,我们通过结合蛋白质序列和药物分子开发了多模态特征提取网络,以及蛋白质-蛋白质相互作用网络和药物-药物相互作用网络。结果表明,它的性能明显优于可比的最新方法。
    BACKGROUND: The Drug-Target Interaction (DTI) prediction uses a drug molecule and a protein sequence as inputs to predict the binding affinity value. In recent years, deep learning-based models have gotten more attention. These methods have two modules: the feature extraction module and the task prediction module. In most deep learning-based approaches, a simple task prediction loss (i.e., categorical cross entropy for the classification task and mean squared error for the regression task) is used to learn the model. In machine learning, contrastive-based loss functions are developed to learn more discriminative feature space. In a deep learning-based model, extracting more discriminative feature space leads to performance improvement for the task prediction module.
    RESULTS: In this paper, we have used multimodal knowledge as input and proposed an attention-based fusion technique to combine this knowledge. Also, we investigate how utilizing contrastive loss function along the task prediction loss could help the approach to learn a more powerful model. Four contrastive loss functions are considered: (1) max-margin contrastive loss function, (2) triplet loss function, (3) Multi-class N-pair Loss Objective, and (4) NT-Xent loss function. The proposed model is evaluated using four well-known datasets: Wang et al. dataset, Luo\'s dataset, Davis, and KIBA datasets.
    CONCLUSIONS: Accordingly, after reviewing the state-of-the-art methods, we developed a multimodal feature extraction network by combining protein sequences and drug molecules, along with protein-protein interaction networks and drug-drug interaction networks. The results show it performs significantly better than the comparable state-of-the-art approaches.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    为了模拟嗅觉的功能,gustation,愿景,和口头接触,智能感官技术已经发展起来。电子鼻顶空固相微萃取气相色谱-质谱(HS-SPME-GC/MS),电子舌头计算机视觉(CV),和质地分析仪(TA)用于各种烘烤方法的羔羊shashliks(LS)的感官表征。通过HS-SPME/GC-MS鉴定了5种烘烤方法的羔羊shashliks中的56种VOCs,基于OAV(>1),21种VOCs被确定为关键化合物。还提出了跨通道感觉转换(CCST),并将其用于预测19种感觉属性及其不同烘烤方法的羔羊shashlik得分。该模型在预测集中取得了令人满意的结果(R2=0.964)。这项研究表明,多模态深度学习模型可以用来模拟评估者,指导和正确的感官评价是可行的。
    To simulate the functions of olfaction, gustation, vision, and oral touch, intelligent sensory technologies have been developed. Headspace solid-phase microextraction gas chromatography-mass spectrometry (HS-SPME-GC/MS) with electronic noses (E-noses), electronic tongues (E-tongues), computer vision (CVs), and texture analyzers (TAs) was applied for sensory characterization of lamb shashliks (LSs) with various roasting methods. A total of 56 VOCs in lamb shashliks with five roasting methods were identified by HS-SPME/GC-MS, and 21 VOCs were identified as key compounds based on OAV (>1). Cross-channel sensory Transformer (CCST) was also proposed and used to predict 19 sensory attributes and their lamb shashlik scores with different roasting methods. The model achieved satisfactory results in the prediction set (R2 = 0.964). This study shows that a multimodal deep learning model can be used to simulate assessor, and it is feasible to guide and correct sensory evaluation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    多发性硬化症(MS)是一种在人脑和脊髓中发展的慢性疾病,这会导致神经的永久性损伤或恶化。MS疾病的严重程度由扩展残疾状态量表监测,由几个功能子分数组成。MS疾病严重程度的早期和准确分类对于通过应用早期治疗干预策略减缓或预防疾病进展至关重要。深度学习的最新进展和电子健康记录(EHR)的广泛使用为应用数据驱动和预测建模工具实现这一目标创造了机会。由于数据不足或模型简单,以前专注于使用单模态机器学习和深度学习算法的研究在预测准确性方面受到限制。在本文中,我们提出了使用患者多模态纵向和纵向EHR数据预测未来多发性硬化疾病严重程度的想法.我们的贡献有两个主要方面。首先,我们描述了整合结构化EHR数据的开创性努力,神经影像学数据和临床笔记,以构建多模式深度学习框架来预测患者的MS严重程度。与使用单模态数据的模型相比,拟议的管道在接收器工作特性曲线(AUROC)下的面积方面增加了19%。第二,该研究还提供了有关MS疾病预测的每种数据模态中嵌入的有用信号量的宝贵见解,这可能会改善数据收集过程。
    Multiple Sclerosis (MS) is a chronic disease developed in the human brain and spinal cord, which can cause permanent damage or deterioration of the nerves. The severity of MS disease is monitored by the Expanded Disability Status Scale, composed of several functional sub-scores. Early and accurate classification of MS disease severity is critical for slowing down or preventing disease progression via applying early therapeutic intervention strategies. Recent advances in deep learning and the wide use of Electronic Health Records (EHR) create opportunities to apply data-driven and predictive modeling tools for this goal. Previous studies focusing on using single-modal machine learning and deep learning algorithms were limited in terms of prediction accuracy due to data insufficiency or model simplicity. In this paper, we proposed the idea of using patients\' multimodal longitudinal and longitudinal EHR data to predict multiple sclerosis disease severity in the future. Our contribution has two main facets. First, we describe a pioneering effort to integrate structured EHR data, neuroimaging data and clinical notes to build a multi-modal deep learning framework to predict patient\'s MS severity. The proposed pipeline demonstrates up to 19% increase in terms of the area under the Area Under the Receiver Operating Characteristic curve (AUROC) compared to models using single-modal data. Second, the study also provides valuable insights regarding the amount useful signal embedded in each data modality with respect to MS disease prediction, which may improve data collection processes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着无线环境下多媒体系统的发展,对人工智能的日益增长的需求是设计一个能够以类似人类的方式全面理解各种信息的正确与人类交流的系统。因此,本文介绍了一种视听场景感知对话系统,该系统可以与用户进行有关视听场景的交流。不仅要全面理解视觉和文本信息,还要理解音频信息。尽管在语言和视觉模式的多模态表示学习方面取得了重大进展,仍然有两个警告:听觉信息的无效使用和深度学习系统推理缺乏可解释性。为了解决这些问题,我们提出了一种新颖的视听场景感知对话系统,该系统利用来自每种模态的一组显式信息作为自然语言的形式,可以自然地融合到语言模型中。它利用基于变压器的解码器在多任务学习设置中基于多模式知识生成连贯且正确的响应。此外,我们还讨论了用响应驱动的时间矩定位方法解释模型的方法,以验证系统如何生成响应。系统本身向用户提供在系统响应过程中提到的证据,作为场景的时间戳的形式。与基线相比,我们显示了所提出的模型在所有定量和定性测量中的优越性。特别是,即使在使用所有三种模态的环境中,所提出的模型也实现了稳健的性能,包括音频。我们还进行了大量的实验来研究所提出的模型。此外,我们在系统响应推理任务中获得了最先进的性能。
    With the development of multimedia systems in wireless environments, the rising need for artificial intelligence is to design a system that can properly communicate with humans with a comprehensive understanding of various types of information in a human-like manner. Therefore, this paper addresses an audio-visual scene-aware dialog system that can communicate with users about audio-visual scenes. It is essential to understand not only visual and textual information but also audio information in a comprehensive way. Despite the substantial progress in multimodal representation learning with language and visual modalities, there are still two caveats: ineffective use of auditory information and the lack of interpretability of the deep learning systems\' reasoning. To address these issues, we propose a novel audio-visual scene-aware dialog system that utilizes a set of explicit information from each modality as a form of natural language, which can be fused into a language model in a natural way. It leverages a transformer-based decoder to generate a coherent and correct response based on multimodal knowledge in a multitask learning setting. In addition, we also address the way of interpreting the model with a response-driven temporal moment localization method to verify how the system generates the response. The system itself provides the user with the evidence referred to in the system response process as a form of the timestamp of the scene. We show the superiority of the proposed model in all quantitative and qualitative measurements compared to the baseline. In particular, the proposed model achieved robust performance even in environments using all three modalities, including audio. We also conducted extensive experiments to investigate the proposed model. In addition, we obtained state-of-the-art performance in the system response reasoning task.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号