Multimodal representation

  • 文章类型: Journal Article
    背景:近年来,随着计算机辅助诊断系统的发展,机器学习在医学诊断和治疗中的使用显着增长,通常基于带注释的医学放射学图像。然而,缺乏大型注释图像数据集仍然是一个主要障碍,因为注释过程耗时且成本高昂。本研究旨在通过提出一种基于语义相似性来注释大型医学放射学图像数据库的自动化方法来克服这一挑战。
    结果:自动,无监督方法用于创建源自临床医院中心Rijeka的大型医学放射学图像注释数据集,克罗地亚。该管道是通过数据挖掘三种不同类型的医疗数据构建的:图像,DICOM元数据和叙事诊断。然后将最佳特征提取器集成到多模态表示中,然后对其进行聚类以创建自动管道,用于将1,337,926个医学图像的前体数据集标记为50个视觉上相似的图像集群。通过检查聚类的同质性和互信息来评估聚类的质量,考虑到解剖区域和模态表示。
    结论:结果表明,将所有三个数据源的嵌入融合在一起,为大规模医疗数据的无监督聚类任务提供了最佳结果,并导致了最简洁的聚类。因此,这项工作标志着朝着建立更大,更细粒度的医学放射学图像注释数据集迈出了第一步。
    BACKGROUND: The use of machine learning in medical diagnosis and treatment has grown significantly in recent years with the development of computer-aided diagnosis systems, often based on annotated medical radiology images. However, the lack of large annotated image datasets remains a major obstacle, as the annotation process is time-consuming and costly. This study aims to overcome this challenge by proposing an automated method for annotating a large database of medical radiology images based on their semantic similarity.
    RESULTS: An automated, unsupervised approach is used to create a large annotated dataset of medical radiology images originating from the Clinical Hospital Centre Rijeka, Croatia. The pipeline is built by data-mining three different types of medical data: images, DICOM metadata and narrative diagnoses. The optimal feature extractors are then integrated into a multimodal representation, which is then clustered to create an automated pipeline for labelling a precursor dataset of 1,337,926 medical images into 50 clusters of visually similar images. The quality of the clusters is assessed by examining their homogeneity and mutual information, taking into account the anatomical region and modality representation.
    CONCLUSIONS: The results indicate that fusing the embeddings of all three data sources together provides the best results for the task of unsupervised clustering of large-scale medical data and leads to the most concise clusters. Hence, this work marks the initial step towards building a much larger and more fine-grained annotated dataset of medical radiology images.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    蛋白质是许多生物过程所必需的复杂生物分子,使它们成为分子生物学发展的关键目标,医学研究,和药物设计。了解他们的错综复杂,层次结构,功能对于这些领域的进步至关重要。为了捕捉这种复杂性,我们介绍了多模态蛋白质表示学习(MPRL),一种新颖的学习统一的对称保持多模态预训练框架,通过整合一级和三级结构的无监督蛋白质表示。MPRL采用进化尺度建模(ESM-2)进行序列分析,用于残差级图的变分图自动编码器(VGAE),和PointNet自动编码器(PAE)用于3D原子点云,每个都旨在捕获蛋白质的空间和进化复杂性,同时保留关键的对称性。通过利用自动融合从这些预训练的模型中合成关节表示,MPRL确保强大和全面的蛋白质表示。我们广泛的评估表明,MPRL显着提高各种任务的性能,如蛋白质-配体结合亲和力预测,蛋白质折叠分类,酶活性鉴定,和突变稳定性预测。该框架促进了对蛋白质动力学的理解,并促进了该领域的未来研究。我们的源代码可在https://github.com/HySonLab/Protein_Pretrain上公开获得。
    Proteins are complex biomolecules essential for numerous biological processes, making them crucial targets for advancements in molecular biology, medical research, and drug design. Understanding their intricate, hierarchical structures, and functions is vital for progress in these fields. To capture this complexity, we introduce Multimodal Protein Representation Learning (MPRL), a novel framework for symmetry-preserving multimodal pretraining that learns unified, unsupervised protein representations by integrating primary and tertiary structures. MPRL employs Evolutionary Scale Modeling (ESM-2) for sequence analysis, Variational Graph Auto-Encoders (VGAE) for residue-level graphs, and PointNet Autoencoder (PAE) for 3D point clouds of atoms, each designed to capture the spatial and evolutionary intricacies of proteins while preserving critical symmetries. By leveraging Auto-Fusion to synthesize joint representations from these pretrained models, MPRL ensures robust and comprehensive protein representations. Our extensive evaluation demonstrates that MPRL significantly enhances performance in various tasks such as protein-ligand binding affinity prediction, protein fold classification, enzyme activity identification, and mutation stability prediction. This framework advances the understanding of protein dynamics and facilitates future research in the field. Our source code is publicly available at https://github.com/HySonLab/Protein_Pretrain.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    来自多种模式的信息集成是一个非常活跃的研究领域。以前的技术主要集中在融合由深度单峰网络生成的浅层特征或高级表示,它仅捕获跨模态的分层关系的子集。然而,以前的方法通常仅限于利用多模态数据固有的细粒度统计特征。本文提出了一种通过计算图像特征均值和标准偏差来密集集成表示的方法。特征的全球统计提供了一个整体的视角,捕获数据固有的总体分布和趋势,从而促进对多模态数据的理解和表征。我们还利用基于Transformer的融合编码器来有效捕获多峰特征的全局变化。为了进一步加强学习过程,我们纳入了一个对比损失函数,鼓励发现跨不同模式的共享信息。为了验证我们方法的有效性,我们在三个广泛使用的多模态情感分析数据集上进行了实验。结果证明了我们提出的方法的有效性,与现有方法相比,实现了显著的性能改进。
    The integration of information from multiple modalities is a highly active area of research. Previous techniques have predominantly focused on fusing shallow features or high-level representations generated by deep unimodal networks, which only capture a subset of the hierarchical relationships across modalities. However, previous methods are often limited to exploiting the fine-grained statistical features inherent in multimodal data. This paper proposes an approach that densely integrates representations by computing image features\' means and standard deviations. The global statistics of features afford a holistic perspective, capturing the overarching distribution and trends inherent in the data, thereby facilitating enhanced comprehension and characterization of multimodal data. We also leverage a Transformer-based fusion encoder to effectively capture global variations in multimodal features. To further enhance the learning process, we incorporate a contrastive loss function that encourages the discovery of shared information across different modalities. To validate the effectiveness of our approach, we conduct experiments on three widely used multimodal sentiment analysis datasets. The results demonstrate the efficacy of our proposed method, achieving significant performance improvements compared to existing approaches.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    动作识别是人机交互的重要组成部分,由于不同模态之间的相互关系和互补性,多模态特征表示和学习方法可以用来提高识别性能。然而,由于缺乏大规模标记的样品,现有的基于ConvNets的方法的性能受到严重限制。在本文中,为了提高模型的动作识别性能和应用场景的泛化能力,提出了一种新颖有效的多模态特征表示和对比自监督学习框架。所提出的识别框架在两个分支之间采用权重共享,并且不需要负样本,它可以通过使用多模态无标记数据有效地学习有用的特征表示,例如,骨架序列和惯性测量单元信号(IMU)。广泛的实验是在两个基准上进行的:UTD-MHAD和MMCact,结果表明,我们提出的识别框架在动作检索方面优于单峰和多模态基线,半监督学习,和零射学习场景。
    Action recognition is an important component of human-computer interaction, and multimodal feature representation and learning methods can be used to improve recognition performance due to the interrelation and complementarity between different modalities. However, due to the lack of large-scale labeled samples, the performance of existing ConvNets-based methods are severely constrained. In this paper, a novel and effective multi-modal feature representation and contrastive self-supervised learning framework is proposed to improve the action recognition performance of models and the generalization ability of application scenarios. The proposed recognition framework employs weight sharing between two branches and does not require negative samples, which could effectively learn useful feature representations by using multimodal unlabeled data, e.g., skeleton sequence and inertial measurement unit signal (IMU). The extensive experiments are conducted on two benchmarks: UTD-MHAD and MMAct, and the results show that our proposed recognition framework outperforms both unimodal and multimodal baselines in action retrieval, semi-supervised learning, and zero-shot learning scenarios.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Data processing in robotics is currently challenged by the effective building of multimodal and common representations. Tremendous volumes of raw data are available and their smart management is the core concept of multimodal learning in a new paradigm for data fusion. Although several techniques for building multimodal representations have been proven successful, they have not yet been analyzed and compared in a given production setting. This paper explored three of the most common techniques, (1) the late fusion, (2) the early fusion, and (3) the sketch, and compared them in classification tasks. Our paper explored different types of data (modalities) that could be gathered by sensors serving a wide range of sensor applications. Our experiments were conducted on Amazon Reviews, MovieLens25M, and Movie-Lens1M datasets. Their outcomes allowed us to confirm that the choice of fusion technique for building multimodal representation is crucial to obtain the highest possible model performance resulting from the proper modality combination. Consequently, we designed criteria for choosing this optimal data fusion technique.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    单词如何与它们帮助表达的思想联系在一起?最近的大脑成像研究表明,单词表征体现在不同的神经系统中,通过这些神经系统来体验单词。在这个想法的基础上,诸如概念属性表示(CAR)理论之类的具体化方法将概念表示为映射到不同大脑系统的一组语义特征(属性)。对这一理论的一个有趣的挑战是,人们根据上下文不同地权衡概念属性,即,它们根据句子中出现的概念的组合动态地构造含义。本研究通过BRAin(CEREBRA)神经网络模型中的Context-dEpendent含义重新表示来解决这一挑战。根据大脑图像的变化,CEREBRA量化句子上下文对词义的影响。计算实验表明,不同上下文中的单词具有不同的表示形式,在概念属性中观察到的变化揭示了独特的概念组合,并且新的表示形式与句子中的其他单词比原始表示形式更相似。行为分析进一步证实,CEREBRA产生的变化是可操作的知识,可用于预测人类的反应。这些实验构成了对CEREBRA的基于上下文的表示的综合评估,表明汽车可以是动态的,并且可以根据上下文进行更改。因此,CEREBRA是一个有用的工具,用于理解单词的含义如何在大脑中表示,为未来关于心理词典的跨学科研究提供了一个框架。
    How are words connected to the thoughts they help to express? Recent brain imaging studies suggest that word representations are embodied in different neural systems through which the words are experienced. Building on this idea, embodied approaches such as the Concept Attribute Representations (CAR) theory represents concepts as a set of semantic features (attributes) mapped to different brain systems. An intriguing challenge to this theory is that people weigh concept attributes differently based on context, i.e., they construct meaning dynamically according to the combination of concepts that occur in the sentence. This research addresses this challenge through the Context-dEpendent meaning REpresentations in the BRAin (CEREBRA) neural network model. Based on changes in the brain images, CEREBRA quantifies the effect of sentence context on word meanings. Computational experiments demonstrated that words in different contexts have different representations, the changes observed in the concept attributes reveal unique conceptual combinations, and that the new representations are more similar to the other words in the sentence than to the original representations. Behavioral analysis further confirmed that the changes produced by CEREBRA are actionable knowledge that can be used to predict human responses. These experiments constitute a comprehensive evaluation of CEREBRA\'s context-based representations, showing that CARs can be dynamic and change based on context. Thus, CEREBRA is a useful tool for understanding how word meanings are represented in the brain, providing a framework for future interdisciplinary research on the mental lexicon.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Our thoughts can influence sleep, but the underlying mechanisms are unknown. According to the theory of \"embodied cognition,\" the semantic content of cognitive processes is represented by multimodal networks in the brain, which include body-related functions. Such multimodal representations could offer a mechanism, which explains mutual influences between cognition and sleep. Here we tested whether sleep-related words are represented in multimodal networks by examining the effect of congruent versus incongruent body positions on word processing during wakefulness. We experimentally manipulated the body position of 66 subjects (19-40 years old) between standing upright and lying down. Sleep- and activity-related words were presented around the individual speech recognition threshold. Our results show that word processing was facilitated in congruent body positions (sleep words: lying down and activity words: standing upright) compared with incongruent body positions, as indicated by a reduced N400 in the congruent condition with the lowest volume. In addition, early sensory components of the ERP (N180 and P280) were enhanced, suggesting that words were also acoustically better understood in a congruent body position. However, the difference in ERPs did not translate to differences on a behavioral level. Our results support the prediction of embodied processing of sleep- and activity-related words. Body position potentially induces a pre-activation of multimodal networks, thereby enhancing access to the semantic concepts of words related to current body position. The link between semantic meaning and body-related function could be a key element in explaining the influences of cognitive processing on sleep.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号