Vision Transformer

视觉变压器
  • 文章类型: Journal Article
    癌症是世界上死亡的主要原因。在所有癌症中,肺癌和结肠癌是导致死亡和发病的2种最常见的原因。这项研究的目的是使用组织病理学图像开发自动化的肺癌和结肠癌分类系统。使用来自LC25000数据集的组织病理学图像开发了自动肺和结肠分类系统。算法开发包括数据拆分,深度神经网络模型选择,在飞行图像增强上,培训和验证。该算法的核心是SwinTransformV2模型,并使用5倍交叉验证来评估模型性能。使用准确性评估了模型性能,Kappa,混淆矩阵,精度,召回,和F1。进行了大量实验来比较不同神经网络的性能,包括主流卷积神经网络和视觉变压器。SwinTransformV2模型在所有指标上都达到了1(100%),这是第一个在该数据集上获得完美结果的单一模型。SwinTransformerV2模型有可能用于帮助病理学家使用组织病理学图像对肺癌和结肠癌进行分类。
    Cancer is the leading cause of mortality in the world. And among all cancers lung and colon cancers are 2 of the most common causes of death and morbidity. The aim of this study was to develop an automated lung and colon cancer classification system using histopathological images. An automated lung and colon classification system was developed using histopathological images from the LC25000 dataset. The algorithm development included data splitting, deep neural network model selection, on the fly image augmentation, training and validation. The core of the algorithm was a Swin Transform V2 model, and 5-fold cross validation was used to evaluate model performance. The model performance was evaluated using Accuracy, Kappa, confusion matrix, precision, recall, and F1. Extensive experiments were conducted to compare the performances of different neural networks including both mainstream convolutional neural networks and vision transformers. The Swin Transform V2 model achieved a 1 (100%) on all metrics, which is the first single model to obtain perfect results on this dataset. The Swin Transformer V2 model has the potential to be used to assist pathologists in classifying lung and colon cancers using histopathology images.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    -从计算机断层扫描(CT)扫描中准确分割肺部肿瘤对于肺癌诊断至关重要。由于2D方法缺乏肺部CT图像的体积信息,基于3D卷积和基于变压器的方法最近已应用于使用CT成像的肺肿瘤分割任务。然而,大多数现有的3D方法无法有效地将卷积学习的局部模式与变形金刚捕获的全局依赖关系进行协作,而广泛忽视了肺部肿瘤的重要边界信息。为了解决这些问题,我们提出了一种3D边界引导混合网络,使用卷积和变压器进行肺肿瘤分割,名为BGHNet。在BGHNet中,我们首先提出了在编码阶段具有并行卷积和转换器分支的混合局部全局上下文聚合(HLGCA)模块。要在HLGCA模块的每个分支中聚合本地和全局上下文,我们不仅设计了体积交叉条纹窗口变压器(VCSwin-Transformer)来构建具有局部感应偏置和大感受场的变压器分支,而且还设计了具有基于变压器的扩展(VPConvNeXt)的体积金字塔卷积,以构建具有多尺度全局信息的卷积分支。然后,我们在解码阶段提出了边界引导特征细化(BGFR)模块,明确地利用边界信息来细化多级解码功能以获得更好的性能。在两个肺肿瘤分割数据集上进行了广泛的实验,包括私有数据集(HUST-Lung)和公共基准数据集(MSD-Lung)。结果表明,在我们的实验中,BGHNet优于其他最先进的2D或3D方法,它在非对比和对比增强CT扫描中表现出优越的泛化性能。
    -Accurate lung tumor segmentation from Computed Tomography (CT) scans is crucial for lung cancer diagnosis. Since the 2D methods lack the volumetric information of lung CT images, 3D convolution-based and Transformer-based methods have recently been applied in lung tumor segmentation tasks using CT imaging. However, most existing 3D methods cannot effectively collaborate the local patterns learned by convolutions with the global dependencies captured by Transformers, and widely ignore the important boundary information of lung tumors. To tackle these problems, we propose a 3D boundary-guided hybrid network using convolutions and Transformers for lung tumor segmentation, named BGHNet. In BGHNet, we first propose the Hybrid Local-Global Context Aggregation (HLGCA) module with parallel convolution and Transformer branches in the encoding phase. To aggregate local and global contexts in each branch of the HLGCA module, we not only design the Volumetric Cross-Stripe Window Transformer (VCSwin-Transformer) to build the Transformer branch with local inductive biases and large receptive fields, but also design the Volumetric Pyramid Convolution with transformer-based extensions (VPConvNeXt) to build the convolution branch with multi-scale global information. Then, we present a Boundary-Guided Feature Refinement (BGFR) module in the decoding phase, which explicitly leverages the boundary information to refine multi-stage decoding features for better performance. Extensive experiments were conducted on two lung tumor segmentation datasets, including a private dataset (HUST-Lung) and a public benchmark dataset (MSD-Lung). Results show that BGHNet outperforms other state-of-the-art 2D or 3D methods in our experiments, and it exhibits superior generalization performance in both non-contrast and contrast-enhanced CT scans.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:从多参数磁共振(MR)图像中精确分割神经胶质瘤对于脑胶质瘤诊断至关重要。然而,由于肿瘤亚区域之间的界限不清和胶质瘤在容积MR扫描中的异质性表现,设计一种可靠和自动化的神经胶质瘤分割方法仍然具有挑战性。尽管现有的基于3DTransformer或基于卷积的分割网络已通过多模态特征融合策略或上下文学习方法获得了有希望的结果,他们广泛缺乏不同模态之间的分层交互能力,并且不能有效地学习与所有神经胶质瘤亚区域相关的综合特征表征.
    目标:为了克服这些问题,在本文中,我们提出了一种3D分层跨模态交互网络(HCMINet),使用变形金刚和卷积进行精确的多模态神经胶质瘤分割,该方法利用有效的分层跨模态交互策略从多参数MR图像中充分学习与神经胶质瘤子区域分割相关的模态特定和模态共享知识。
    方法:在HCMINet中,我们首先设计了一个分层的跨模态交互变换器(HCMITrans)编码器,通过基于变换器的模态内嵌入和多个编码阶段的模态间交互,对异构多模态特征进行分层编码和融合,它在建模全局上下文时有效地捕获复杂的跨模态相关性。然后,我们将HCMITrans编码器与模态共享卷积编码器协作,在编码阶段构建双编码器架构,它可以从全球和本地角度学习丰富的上下文信息。最后,在解码阶段,我们提出了一种渐进式混合上下文融合(PHCF)解码器,以逐步融合由双编码器架构提取的局部和全局特征,它利用局部-全局上下文融合(LGCF)模块来有效地减轻解码特征之间的上下文差异。
    结果:在两个公共和竞争性神经胶质瘤基准数据集上进行了广泛的实验,包括包含494名患者的BraTS2020数据集和包含1251名患者的BraTS2021数据集。结果表明,在我们的实验中,我们提出的方法优于现有的基于变压器和基于CNN的方法,使用其他多模态融合策略。具体来说,拟议的HCMINet在BraTS2020在线验证数据集和BraTS2021本地测试数据集上实现了85.33%和91.09%的最新平均DSC值,分别。
    结论:我们提出的方法可以准确,自动地从多参数MR图像中分割出神经胶质瘤区域,这有利于脑胶质瘤的定量分析,有助于减轻神经放射学家的注释负担。
    BACKGROUND: Precise glioma segmentation from multi-parametric magnetic resonance (MR) images is essential for brain glioma diagnosis. However, due to the indistinct boundaries between tumor sub-regions and the heterogeneous appearances of gliomas in volumetric MR scans, designing a reliable and automated glioma segmentation method is still challenging. Although existing 3D Transformer-based or convolution-based segmentation networks have obtained promising results via multi-modal feature fusion strategies or contextual learning methods, they widely lack the capability of hierarchical interactions between different modalities and cannot effectively learn comprehensive feature representations related to all glioma sub-regions.
    OBJECTIVE: To overcome these problems, in this paper, we propose a 3D hierarchical cross-modality interaction network (HCMINet) using Transformers and convolutions for accurate multi-modal glioma segmentation, which leverages an effective hierarchical cross-modality interaction strategy to sufficiently learn modality-specific and modality-shared knowledge correlated to glioma sub-region segmentation from multi-parametric MR images.
    METHODS: In the HCMINet, we first design a hierarchical cross-modality interaction Transformer (HCMITrans) encoder to hierarchically encode and fuse heterogeneous multi-modal features by Transformer-based intra-modal embeddings and inter-modal interactions in multiple encoding stages, which effectively captures complex cross-modality correlations while modeling global contexts. Then, we collaborate an HCMITrans encoder with a modality-shared convolutional encoder to construct the dual-encoder architecture in the encoding stage, which can learn the abundant contextual information from global and local perspectives. Finally, in the decoding stage, we present a progressive hybrid context fusion (PHCF) decoder to progressively fuse local and global features extracted by the dual-encoder architecture, which utilizes the local-global context fusion (LGCF) module to efficiently alleviate the contextual discrepancy among the decoding features.
    RESULTS: Extensive experiments are conducted on two public and competitive glioma benchmark datasets, including the BraTS2020 dataset with 494 patients and the BraTS2021 dataset with 1251 patients. Results show that our proposed method outperforms existing Transformer-based and CNN-based methods using other multi-modal fusion strategies in our experiments. Specifically, the proposed HCMINet achieves state-of-the-art mean DSC values of 85.33% and 91.09% on the BraTS2020 online validation dataset and the BraTS2021 local testing dataset, respectively.
    CONCLUSIONS: Our proposed method can accurately and automatically segment glioma regions from multi-parametric MR images, which is beneficial for the quantitative analysis of brain gliomas and helpful for reducing the annotation burden of neuroradiologists.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    视觉对象跟踪,对地球观测和环境监测等应用至关重要,在弱光和复杂背景等不利条件下遇到挑战。传统的跟踪技术经常步履蹒跚,特别是在快速运动和环境干扰中跟踪动态物体如飞机时。本研究引入了一种创新的自适应多模图像对象跟踪模型,该模型利用了多光谱图像传感器的功能,结合红外和可见光图像,以显著提高跟踪精度和鲁棒性。通过采用先进的视觉转换器架构,并集成令牌空间滤波(TSF)和跨模态补偿(CMC),我们的模型动态调整以适应不同的跟踪场景。在私有数据集和各种公共数据集上进行的综合实验证明了该模型在极端条件下的卓越性能,肯定了其对快速环境变化和传感器局限性的适应性。这项研究不仅推进了视觉跟踪技术,还提供了对多源图像融合和自适应跟踪策略的广泛见解,为未来基于传感器的跟踪系统的增强奠定了坚实的基础。
    Visual object tracking, pivotal for applications like earth observation and environmental monitoring, encounters challenges under adverse conditions such as low light and complex backgrounds. Traditional tracking technologies often falter, especially when tracking dynamic objects like aircraft amidst rapid movements and environmental disturbances. This study introduces an innovative adaptive multimodal image object-tracking model that harnesses the capabilities of multispectral image sensors, combining infrared and visible light imagery to significantly enhance tracking accuracy and robustness. By employing the advanced vision transformer architecture and integrating token spatial filtering (TSF) and crossmodal compensation (CMC), our model dynamically adjusts to diverse tracking scenarios. Comprehensive experiments conducted on a private dataset and various public datasets demonstrate the model\'s superior performance under extreme conditions, affirming its adaptability to rapid environmental changes and sensor limitations. This research not only advances visual tracking technology but also offers extensive insights into multisource image fusion and adaptive tracking strategies, establishing a robust foundation for future enhancements in sensor-based tracking systems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本文提出了一种利用超声导波和神经网络的色散行为通过色散图像确定板状结构的各向同性弹性常数的方法。因此,比较了两种不同的架构:一种使用基于EfficientNetB7的卷积和迁移学习以及类似VisionTransformer的方法。要做到这一点,生成模拟和测量的色散图像,第一个应用于设计,火车,并验证和第二个测试神经网络。在神经网络的训练过程中,不同的数据增强层被用来将出现在测量数据中的伪像引入到模拟数据中。神经网络可以使用这些层从模拟数据外推至测量数据。使用来自七个已知材料样本的色散图像来评估经训练的神经网络。测试测量的色散图像的多个变化以保证预测稳定性。研究表明,神经网络可以仅使用模拟的色散图像进行训练和验证,而无需进行初始猜测或手动特征提取,就可以学习从测量的色散图像中预测各向同性弹性常数。独立于测量设置。此外,通常,从色散图像和图像回归可视化技术生成信息的不同体系结构的适用性,正在讨论。
    This article presents a method to use the dispersive behavior of ultrasonic guided waves and neural networks to determine the isotropic elastic constants of plate-like structures through dispersion images. Therefore, two different architectures are compared: one using convolutions and transfer learning based on the EfficientNetB7 and a Vision Transformer-like approach. To accomplish this, simulated and measured dispersion images are generated, where the first is applied to design, train, and validate and the second to test the neural networks. During the training of the neural networks, distinct data augmentation layers are employed to introduce artifacts appearing in measurement data into the simulated data. The neural networks can extrapolate from simulated to measured data using these layers. The trained neural networks are assessed using dispersion images from seven known material samples. Multiple variations of the measured dispersion images are tested to guarantee the prediction stability. The study demonstrates that neural networks can learn to predict the isotropic elastic constants from measured dispersion images using only simulated dispersion images for training and validation without needing an initial guess or manual feature extraction, independent of the measurement setup. Furthermore, the suitability of the different architectures for generating information from dispersion images in general and an image-to-regression visualisation technique, are discussed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    由于软组织对比度低等因素,在计划计算机断层扫描(CT)图像中对前列腺区域进行手动轮廓绘制是一项具有挑战性的任务,观察者间和观察者内的可变性,器官大小和形状的变化。因此,自动轮廓方法的使用可以提供显著的优点。在这项研究中,我们的目的是使用结合卷积和ViT技术的混合卷积神经网络-视觉变换器(CNN-ViT),研究多中心规划CT图像中男性骨盆多器官自动轮廓.
    方法:我们使用了104例局限性前列腺癌患者的回顾性数据,与临床靶体积(CTV)和关键的危险器官(OAR)的外部束放疗的轮廓。我们引入了一种新颖的基于注意力的融合模块,该模块将通过卷积提取的详细特征与通过ViT获得的全局特征合并。
    结果:VGG16-UNet-ViT对前列腺的平均骰子相似系数(DSC),膀胱,直肠,右股骨头(RFH),左股骨头(LFH)占91.75%,95.32%,87.00%,96.30%,和96.34%,分别。在多中心规划CT图像上进行的实验表明,与纯CNN和变压器架构相比,将ViT结构与CNN网络相结合可实现所有器官的卓越性能。此外,与最先进的技术相比,所提出的方法实现了更精确的轮廓。
    结论:结果表明,将ViT集成到CNN架构中可以显着提高分割性能。这些结果表明有望成为促进前列腺放射治疗计划的可靠且有效的工具。
    UNASSIGNED: Manual contouring of the prostate region in planning computed tomography (CT) images is a challenging task due to factors such as low contrast in soft tissues, inter- and intra-observer variability, and variations in organ size and shape. Consequently, the use of automated contouring methods can offer significant advantages. In this study, we aimed to investigate automated male pelvic multi-organ contouring in multi-center planning CT images using a hybrid convolutional neural network-vision transformer (CNN-ViT) that combines convolutional and ViT techniques.
    METHODS: We used retrospective data from 104 localized prostate cancer patients, with delineations of the clinical target volume (CTV) and critical organs at risk (OAR) for external beam radiotherapy. We introduced a novel attention-based fusion module that merges detailed features extracted through convolution with the global features obtained through the ViT.
    RESULTS: The average dice similarity coefficients (DSCs) achieved by VGG16-UNet-ViT for the prostate, bladder, rectum, right femoral head (RFH), and left femoral head (LFH) were 91.75%, 95.32%, 87.00%, 96.30%, and 96.34%, respectively. Experiments conducted on multi-center planning CT images indicate that combining the ViT structure with the CNN network resulted in superior performance for all organs compared to pure CNN and transformer architectures. Furthermore, the proposed method achieves more precise contours compared to state-of-the-art techniques.
    CONCLUSIONS: Results demonstrate that integrating ViT into CNN architectures significantly improves segmentation performance. These results show promise as a reliable and efficient tool to facilitate prostate radiotherapy treatment planning.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    这项研究调查了使用视觉变压器(ViT)从减少的瞬变中重建GABA编辑的磁共振波谱(MRS)数据的情况。瞬变是指在MRS采集期间通过重复实验以产生足够质量的信号而收集的样本。具体来说,使用80个瞬态代替典型的320个瞬态,旨在减少扫描时间。对80个瞬态进行预处理并使用短时傅里叶变换(STFT)转换为光谱图图像表示。预先训练的ViT,名为Spectro-ViT,进行微调,然后使用体内GABA编辑的MEGA-PRESS数据进行测试。使用定量质量指标和估计的代谢物浓度值,将其性能与文献中的其他管道进行了比较,与典型的320瞬态扫描作为比较的参考。Spectro-ViT模型在与之比较的所有其他管道中表现出最佳的整体质量指标。从Spectro-ViT重建的GABA+的代谢物浓度达到了最好的平均R2值0.67和最好的平均平均绝对百分比误差(MAPE)值9.68%,与320瞬时参考相比,没有发现显著的统计学差异。复制此研究的代码可在https://github.com/MICLab-Unicamp/Spectro-ViT获得。
    This study investigated the use of a Vision Transformer (ViT) for reconstructing GABA-edited Magnetic Resonance Spectroscopy (MRS) data from a reduced number of transients. Transients refer to the samples collected during an MRS acquisition by repeating the experiment to generate a signal of sufficient quality. Specifically, 80 transients were used instead of the typical 320 transients, aiming to reduce scan time. The 80 transients were pre-processed and converted into a spectrogram image representation using the Short-Time Fourier Transform (STFT). A pre-trained ViT, named Spectro-ViT, was fine-tuned and then tested using in-vivo GABA-edited MEGA-PRESS data. Its performance was compared against other pipelines in the literature using quantitative quality metrics and estimated metabolite concentration values, with the typical 320-transient scans serving as the reference for comparison. The Spectro-ViT model exhibited the best overall quality metrics among all other pipelines against which it was compared. The metabolite concentrations from Spectro-ViT\'s reconstructions for GABA+ achieved the best average R2 value of 0.67 and the best average Mean Absolute Percentage Error (MAPE) value of 9.68%, with no significant statistical differences found compared to the 320-transient reference. The code to reproduce this research is available at https://github.com/MICLab-Unicamp/Spectro-ViT.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    肾肿瘤是泌尿外科常见疾病之一,这些肿瘤的精确分割在帮助医生提高诊断准确性和治疗有效性方面起着至关重要的作用。然而,与肾肿瘤相关的固有挑战,比如模糊的边界,形态学变异,以及大小和位置的不确定性,肾肿瘤的准确分割仍然是医学图像分割领域的重大挑战。随着深度学习的发展,在医学图像分割领域取得了实质性的成果。然而,现有的模型在跨不同网络层次结构提取肾肿瘤特征方面缺乏特异性,这导致肾肿瘤特征的提取不足,进而影响肾肿瘤分割的准确性。为了解决这个问题,我们提出了选择性内核,视觉变压器,和协调注意力增强U-Net(STC-UNet)。该模型旨在增强特征提取,适应不同网络水平的肾肿瘤的独特特征。具体来说,在U-Net的浅层中引入了选择性内核模块,细节特征更丰富。通过选择性地使用不同尺度的卷积核,该模型增强了其在多个尺度上提取肾肿瘤详细特征的能力。随后,在网络的深层,特征地图较小但包含丰富的语义信息,VisionTransformer模块以非补丁方式集成。这些有助于模型全局捕获远程上下文信息。它们的非补丁实现有助于捕获细粒度特征,从而实现全局-局部信息的协同增强,最终加强模型对肾肿瘤语义特征的提取。最后,在解码器段中,提出了嵌入位置信息的协调注意力模块,旨在增强模型的特征恢复和肿瘤区域定位能力。我们的模型在KiTS19数据集上进行了验证,实验结果表明,与基线模型相比,STC-UNet显示改进1.60%,2.02%,2.27%,1.18%,1.52%,和1.35%的IoU,骰子,准确性,Precision,回想一下,和F1得分,分别。此外,实验结果表明,所提出的STC-UNet方法在视觉有效性和客观评价指标上都优于其他高级算法。
    Renal tumors are one of the common diseases of urology, and precise segmentation of these tumors plays a crucial role in aiding physicians to improve diagnostic accuracy and treatment effectiveness. Nevertheless, inherent challenges associated with renal tumors, such as indistinct boundaries, morphological variations, and uncertainties in size and location, segmenting renal tumors accurately remains a significant challenge in the field of medical image segmentation. With the development of deep learning, substantial achievements have been made in the domain of medical image segmentation. However, existing models lack specificity in extracting features of renal tumors across different network hierarchies, which results in insufficient extraction of renal tumor features and subsequently affects the accuracy of renal tumor segmentation. To address this issue, we propose the Selective Kernel, Vision Transformer, and Coordinate Attention Enhanced U-Net (STC-UNet). This model aims to enhance feature extraction, adapting to the distinctive characteristics of renal tumors across various network levels. Specifically, the Selective Kernel modules are introduced in the shallow layers of the U-Net, where detailed features are more abundant. By selectively employing convolutional kernels of different scales, the model enhances its capability to extract detailed features of renal tumors across multiple scales. Subsequently, in the deeper layers of the network, where feature maps are smaller yet contain rich semantic information, the Vision Transformer modules are integrated in a non-patch manner. These assist the model in capturing long-range contextual information globally. Their non-patch implementation facilitates the capture of fine-grained features, thereby achieving collaborative enhancement of global-local information and ultimately strengthening the model\'s extraction of semantic features of renal tumors. Finally, in the decoder segment, the Coordinate Attention modules embedding positional information are proposed aiming to enhance the model\'s feature recovery and tumor region localization capabilities. Our model is validated on the KiTS19 dataset, and experimental results indicate that compared to the baseline model, STC-UNet shows improvements of 1.60%, 2.02%, 2.27%, 1.18%, 1.52%, and 1.35% in IoU, Dice, Accuracy, Precision, Recall, and F1-score, respectively. Furthermore, the experimental results demonstrate that the proposed STC-UNet method surpasses other advanced algorithms in both visual effectiveness and objective evaluation metrics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    智能医疗通过整合数据驱动的方法推动了医疗行业的发展。人工智能和机器学习提供了显著的进步,但是这种应用缺乏透明度和可解释性。为了克服这些限制,可解释AI(EXAI)提供了一个有希望的结果。本文将EXAI应用于智能医疗的疾病诊断。本文结合迁移学习的方法,视觉变压器,和可解释的人工智能,并设计了一种综合方法来预测疾病及其严重程度。在阿尔茨海默病的数据集上评估结果。结果分析比较了迁移学习模型与迁移学习集成模型和视觉变换器的性能。为了培训,选择InceptionV3,VGG19,Resnet50和Densenet121迁移学习模型与视觉转换器进行融合。结果比较了ADNI数据集上两种模型的性能:迁移学习(TL)模型和与视觉转换器(ViT)结合的集成迁移学习(EnsembleTL)模型。对于TL模型,准确率为58%,精度为52%,召回率为42%,F1分数为44%。然而,具有ViT的EnsembleTL模型显示出显着提高的性能,即,96%的准确度,94%的精度,ADNI数据集上90%的召回和92%的F1评分。这显示了集成模型相对于迁移学习模型的有效性。
    Smart healthcare has advanced the medical industry with the integration of data-driven approaches. Artificial intelligence and machine learning provided remarkable progress, but there is a lack of transparency and interpretability in such applications. To overcome such limitations, explainable AI (EXAI) provided a promising result. This paper applied the EXAI for disease diagnosis in the advancement of smart healthcare. The paper combined the approach of transfer learning, vision transformer, and explainable AI and designed an ensemble approach for prediction of disease and its severity. The result is evaluated on a dataset of Alzheimer\'s disease. The result analysis compared the performance of transfer learning models with the ensemble model of transfer learning and vision transformer. For training, InceptionV3, VGG19, Resnet50, and Densenet121 transfer learning models were selected for ensembling with vision transformer. The result compares the performance of two models: a transfer learning (TL) model and an ensemble transfer learning (Ensemble TL) model combined with vision transformer (ViT) on ADNI dataset. For the TL model, the accuracy is 58 %, precision is 52 %, recall is 42 %, and the F1-score is 44 %. Whereas, the Ensemble TL model with ViT shows significantly improved performance i.e., 96 % of accuracy, 94 % of precision, 90 % of recall and 92 % of F1-score on ADNI dataset. This shows the efficacy of the ensemble model over transfer learning models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    人-对象交互(HOI)检测在涉及交互实例的识别和交互类别的分类的图像中识别“一组交互”。图像内容的复杂性和多样性使得这项任务具有挑战性。最近,该变压器已应用于计算机视觉中,并在HOI检测任务中受到关注。因此,本文提出了一种用于HOI检测的新型零件细化串联变压器(PRTT)。与以前的基于Transformer的HOI方法不同,PRTT利用多个解码器来分割和处理HOI预测的丰富元素,并引入新的零件状态特征提取(PSFE)模块来帮助改善最终的交互类别分类。我们采用一种新颖的先验特征集成交叉注意(PFIC)来利用PSFE模块获得的细粒度部分状态语义和外观特征输出来指导查询。我们在两个公共数据集上验证了我们的方法,V-COCO和HICO-DET。与最先进的模型相比,PRTT显著提高了检测人-物交互的性能。
    Human-object interaction (HOI) detection identifies a \"set of interactions\" in an image involving the recognition of interacting instances and the classification of interaction categories. The complexity and variety of image content make this task challenging. Recently, the Transformer has been applied in computer vision and received attention in the HOI detection task. Therefore, this paper proposes a novel Part Refinement Tandem Transformer (PRTT) for HOI detection. Unlike the previous Transformer-based HOI method, PRTT utilizes multiple decoders to split and process rich elements of HOI prediction and introduces a new part state feature extraction (PSFE) module to help improve the final interaction category classification. We adopt a novel prior feature integrated cross-attention (PFIC) to utilize the fine-grained partial state semantic and appearance feature output obtained by the PSFE module to guide queries. We validate our method on two public datasets, V-COCO and HICO-DET. Compared to state-of-the-art models, the performance of detecting human-object interaction is significantly improved by the PRTT.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号