Attention mechanisms

  • 文章类型: Journal Article
    许多研究表明,microRNAs(miRNAs)对预测至关重要,诊断,和疾病的表征。然而,通过传统的生物学实验鉴定miRNA-疾病关联既昂贵又耗时。为了进一步探索这些关联,我们提出了一种基于混合高阶矩结合元素级注意机制(HHOMR)的模型。该模型创新性地融合了混合高阶统计信息以及结构和社区信息。具体来说,我们首先基于miRNA和疾病之间的现有关联构建了一个异构图。HHOMR采用结构融合层来捕获结构级嵌入,并利用混合高阶矩编码器层来增强功能。然后使用元素级注意力机制自适应地整合这些混合矩的特征。最后,多层感知器用于计算miRNA与疾病之间的关联评分。通过对HMDDv2.0的五次交叉验证,我们获得了93.28%的平均AUC。与四种最先进的型号相比,HHOMR表现出优异的性能。此外,三种疾病的案例研究-食管肿瘤,淋巴瘤和前列腺肿瘤-进行。在疾病关联得分较高的前50名miRNA中,dbDEMC和miR2疾病数据库证实了与这些疾病相关的46、47和45,分别。我们的结果表明,HHOMR不仅优于现有模型,而且在预测miRNA-疾病关联方面也显示出显著的潜力。
    Numerous studies have demonstrated that microRNAs (miRNAs) are critically important for the prediction, diagnosis, and characterization of diseases. However, identifying miRNA-disease associations through traditional biological experiments is both costly and time-consuming. To further explore these associations, we proposed a model based on hybrid high-order moments combined with element-level attention mechanisms (HHOMR). This model innovatively fused hybrid higher-order statistical information along with structural and community information. Specifically, we first constructed a heterogeneous graph based on existing associations between miRNAs and diseases. HHOMR employs a structural fusion layer to capture structure-level embeddings and leverages a hybrid high-order moments encoder layer to enhance features. Element-level attention mechanisms are then used to adaptively integrate the features of these hybrid moments. Finally, a multi-layer perceptron is utilized to calculate the association scores between miRNAs and diseases. Through five-fold cross-validation on HMDD v2.0, we achieved a mean AUC of 93.28%. Compared with four state-of-the-art models, HHOMR exhibited superior performance. Additionally, case studies on three diseases-esophageal neoplasms, lymphoma, and prostate neoplasms-were conducted. Among the top 50 miRNAs with high disease association scores, 46, 47, and 45 associated with these diseases were confirmed by the dbDEMC and miR2Disease databases, respectively. Our results demonstrate that HHOMR not only outperforms existing models but also shows significant potential in predicting miRNA-disease associations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    面部表情识别(FER)在情感计算中起着至关重要的作用,通过使机器能够理解和响应人类的情感来增强人机交互。尽管深度学习取得了进步,当前的FER系统经常面临诸如遮挡之类的挑战,头部姿势变化,和自然环境中的运动模糊。这些挑战凸显了对更强大的FER解决方案的需求。为了解决这些问题,我们提出了注意力增强多层变压器(AEMT)模型,它集成了一个双分支卷积神经网络(CNN),注意选择性融合(ASF)模块,和具有迁移学习的多层变换器编码器(MTE)。双分支CNN通过分别处理RGB和局部二进制模式(LBP)特征来捕获详细的纹理和颜色信息。ASF模块通过将全局和局部注意机制应用于所提取的特征来选择性地增强相关特征。MTE捕获长期依赖关系,并对功能之间的复杂关系进行建模,共同提高特征表示和分类精度。我们的模型在RAF-DB和AffectNet数据集上进行了评估。实验结果表明,该模型在RAF-DB上达到81.45%,在AffectNet上达到71.23%,显著优于现有的最先进的方法。这些结果表明,我们的模型有效地解决了自然环境中FER的挑战,提供更可靠和准确的解决方案。AEMT模型通过提高复杂现实场景中情绪识别的鲁棒性和准确性,显著推进了FER领域。这项工作不仅增强了情感计算系统的功能,而且为提高模型效率和扩展多模态数据集成的未来研究开辟了新途径。
    Facial expression recognition (FER) plays a crucial role in affective computing, enhancing human-computer interaction by enabling machines to understand and respond to human emotions. Despite advancements in deep learning, current FER systems often struggle with challenges such as occlusions, head pose variations, and motion blur in natural environments. These challenges highlight the need for more robust FER solutions. To address these issues, we propose the Attention-Enhanced Multi-Layer Transformer (AEMT) model, which integrates a dual-branch Convolutional Neural Network (CNN), an Attentional Selective Fusion (ASF) module, and a Multi-Layer Transformer Encoder (MTE) with transfer learning. The dual-branch CNN captures detailed texture and color information by processing RGB and Local Binary Pattern (LBP) features separately. The ASF module selectively enhances relevant features by applying global and local attention mechanisms to the extracted features. The MTE captures long-range dependencies and models the complex relationships between features, collectively improving feature representation and classification accuracy. Our model was evaluated on the RAF-DB and AffectNet datasets. Experimental results demonstrate that the AEMT model achieved an accuracy of 81.45% on RAF-DB and 71.23% on AffectNet, significantly outperforming existing state-of-the-art methods. These results indicate that our model effectively addresses the challenges of FER in natural environments, providing a more robust and accurate solution. The AEMT model significantly advances the field of FER by improving the robustness and accuracy of emotion recognition in complex real-world scenarios. This work not only enhances the capabilities of affective computing systems but also opens new avenues for future research in improving model efficiency and expanding multimodal data integration.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    番茄病害图像识别在农业生产中起着至关重要的作用。今天,基于深度学习的机器视觉方法在疾病识别方面取得了一定的成功,他们仍然面临着一些挑战。这些问题包括不平衡的数据集,疾病特征不明确,阶级间的小差异,和大的类内变化。为了应对这些挑战,提出了一种基于机器视觉的番茄叶部病害分类识别方法。首先,为了增强图像中的疾病特征细节,分段线性变换方法用于图像增强,并且使用过采样来扩展数据集,补偿不平衡的数据集。接下来,本文介绍了一种具有双重注意机制的卷积块,称为DAC块,用于构造名为LDAMNet的轻量级模型。DAC模块创新性地使用混合通道注意力(HCA)和协调注意力(CSA)分别处理输入图像的通道信息和空间信息,增强模型的特征提取能力。此外,本文提出了一种对噪声标签鲁棒的鲁棒交叉熵(RCE)损失函数,旨在减少训练过程中噪声标签对LDAMNet模型的影响。实验结果表明,该方法在番茄病害数据集上的平均识别准确率为98.71%,有效地将疾病信息保留在图像中并捕获疾病区域。此外,该方法还对水稻作物病害数据集具有很强的识别能力,表明良好的泛化性能和在不同作物的病害识别中有效发挥作用的能力。本文的研究成果为作物病害识别领域提供了新的思路和方法。然而,未来的研究需要进一步优化模型的结构和计算效率,并在更实际的场景中验证其应用效果。
    Tomato disease image recognition plays a crucial role in agricultural production. Today, while machine vision methods based on deep learning have achieved some success in disease recognition, they still face several challenges. These include issues such as imbalanced datasets, unclear disease features, small inter-class differences, and large intra-class variations. To address these challenges, this paper proposes a method for classifying and recognizing tomato leaf diseases based on machine vision. First, to enhance the disease feature details in images, a piecewise linear transformation method is used for image enhancement, and oversampling is employed to expand the dataset, compensating for the imbalanced dataset. Next, this paper introduces a convolutional block with a dual attention mechanism called DAC Block, which is used to construct a lightweight model named LDAMNet. The DAC Block innovatively uses Hybrid Channel Attention (HCA) and Coordinate Attention (CSA) to process channel information and spatial information of input images respectively, enhancing the model\'s feature extraction capabilities. Additionally, this paper proposes a Robust Cross-Entropy (RCE) loss function that is robust to noisy labels, aimed at reducing the impact of noisy labels on the LDAMNet model during training. Experimental results show that this method achieves an average recognition accuracy of 98.71% on the tomato disease dataset, effectively retaining disease information in images and capturing disease areas. Furthermore, the method also demonstrates strong recognition capabilities on rice crop disease datasets, indicating good generalization performance and the ability to function effectively in disease recognition across different crops. The research findings of this paper provide new ideas and methods for the field of crop disease recognition. However, future research needs to further optimize the model\'s structure and computational efficiency, and validate its application effects in more practical scenarios.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究旨在探索利用深度学习技术对排球训练视频进行分类和描述的方法。通过开发集成双向长短期记忆(BiLSTM)和注意力机制的创新模型,参考BiLSTM-多模态注意融合时间分类(BiLSTM-MAFTC),提高了排球视频内容分析的准确性和效率。最初,该模型将来自各种模态的特征编码为特征向量,捕获不同类型的信息,如位置和模态数据。然后使用BiLSTM网络对多模态时间信息进行建模,而空间和渠道注意力机制被纳入以形成双重注意力模块。该模块建立不同模态特征之间的相关性,从每种模态中提取有价值的信息,并发现跨模态的互补信息。大量实验验证了该方法的有效性和最先进的性能。与传统的递归神经网络算法相比,在动作识别的Top-1和Top-5度量下,该模型的识别准确率超过95%,每个视频的识别速度为0.04s。研究表明,该模型能够有效地处理和分析多模态时态信息,包括运动员的动作,在法庭上的位置关系,和球的轨迹。因此,实现了排球训练视频的精确分类和描述。这种进步大大提高了教练员和运动员在排球训练中的效率,并为更广泛的体育视频分析研究提供了宝贵的见解。
    This study aims to explore methods for classifying and describing volleyball training videos using deep learning techniques. By developing an innovative model that integrates Bi-directional Long Short-Term Memory (BiLSTM) and attention mechanisms, referred to BiLSTM-Multimodal Attention Fusion Temporal Classification (BiLSTM-MAFTC), the study enhances the accuracy and efficiency of volleyball video content analysis. Initially, the model encodes features from various modalities into feature vectors, capturing different types of information such as positional and modal data. The BiLSTM network is then used to model multi-modal temporal information, while spatial and channel attention mechanisms are incorporated to form a dual-attention module. This module establishes correlations between different modality features, extracting valuable information from each modality and uncovering complementary information across modalities. Extensive experiments validate the method\'s effectiveness and state-of-the-art performance. Compared to conventional recurrent neural network algorithms, the model achieves recognition accuracies exceeding 95 % under Top-1 and Top-5 metrics for action recognition, with a recognition speed of 0.04 s per video. The study demonstrates that the model can effectively process and analyze multimodal temporal information, including athlete movements, positional relationships on the court, and ball trajectories. Consequently, precise classification and description of volleyball training videos are achieved. This advancement significantly enhances the efficiency of coaches and athletes in volleyball training and provides valuable insights for broader sports video analysis research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    柠檬,作为一种具有丰富营养价值的重要经济作物,在全球范围内具有重要的种植重要性和市场需求。然而,柠檬病害严重影响柠檬的品质和产量,必须及早发现以进行有效控制。本文通过收集柠檬疾病的数据集来满足这一需求,由在不同光照水平下拍摄的726张图像组成,生长阶段,射击距离和疾病状况。通过裁剪高分辨率图像,数据集扩展到2022年的图像,包括4441只健康柠檬和718只患病柠檬,每个图像大约有1-6个目标。然后,我们提出了一种新的模型柠檬表面病YOLO(LSD-YOLO),集成了可切换Atrous卷积(SAConv)和卷积块注意力模块(CBAM),同时设计了C2f-SAC,增加了小目标探测层,增强了关键特征的提取和不同尺度特征的融合。实验结果表明,所提出的LSD-YOLO在收集的数据集上达到了90.62%的精度,MAP@50-95达到80.84%。与原来的YOLOv8n型号相比,mAP@50和mAP@50-95指标都得到了增强。因此,本研究中提出的LSD-YOLO模型提供了对健康和患病柠檬的更准确识别,有助于有效解决柠檬病检测问题。
    Lemon, as an important cash crop with rich nutritional value, holds significant cultivation importance and market demand worldwide. However, lemon diseases seriously impact the quality and yield of lemons, necessitating their early detection for effective control. This paper addresses this need by collecting a dataset of lemon diseases, consisting of 726 images captured under varying light levels, growth stages, shooting distances and disease conditions. Through cropping high-resolution images, the dataset is expanded to 2022 images, comprising 4441 healthy lemons and 718 diseased lemons, with approximately 1-6 targets per image. Then, we propose a novel model lemon surface disease YOLO (LSD-YOLO), which integrates Switchable Atrous Convolution (SAConv) and Convolutional Block Attention Module (CBAM), along with the design of C2f-SAC and the addition of a small-target detection layer to enhance the extraction of key features and the fusion of features at different scales. The experimental results demonstrate that the proposed LSD-YOLO achieves an accuracy of 90.62% on the collected datasets, with mAP@50-95 reaching 80.84%. Compared with the original YOLOv8n model, both mAP@50 and mAP@50-95 metrics are enhanced. Therefore, the LSD-YOLO model proposed in this study provides a more accurate recognition of healthy and diseased lemons, contributing effectively to solving the lemon disease detection problem.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在植物育种和作物管理中,可解释性在灌输对人工智能驱动方法的信任和提供可操作的见解方面起着至关重要的作用。这项研究的主要目的是探索和评估使用堆叠LSTM进行季末玉米籽粒产量预测的深度学习网络体系结构的潜在贡献。第二个目标是通过调整这些网络以更好地适应和利用遥感数据的多模态属性来扩展这些网络的能力。在这项研究中,一种从异构数据流中吸收输入的多模态深度学习架构,包括高分辨率的高光谱图像,激光雷达点云,和环境数据,建议预测玉米作物产量。该架构包括注意力机制,这些机制将不同级别的重要性分配给不同的模态和时间特征,反映了植物生长和环境相互作用的动态。在多模式网络中研究了注意力权重的可解释性,该网络旨在改善预测并将作物产量结果归因于遗传和环境变量。这种方法还有助于增加模型预测的可解释性。时间注意力权重分布突出了有助于预测的相关因素和关键增长阶段。这项研究的结果确认,注意权重与公认的生物生长阶段一致,从而证实了网络学习生物学可解释特征的能力。在这项以遗传学为重点的研究中,模型预测产量的准确性范围为0.82-0.93R2ref,进一步突出了基于注意力的模型的潜力。Further,这项研究有助于理解多模态遥感如何与玉米的生理阶段保持一致。拟议的架构显示了改善预测和提供可解释的见解影响玉米作物产量的因素的希望,同时证明了通过不同方式收集数据对整个生长季节的影响。通过确定相关因素和关键生长阶段,该模型的注意力权重提供了有价值的信息,可用于植物育种和作物管理。注意力权重与生物生长阶段的一致性增强了深度学习网络在农业应用中的潜力。特别是在利用遥感数据进行产量预测方面。据我们所知,这是第一项研究使用高光谱和LiDAR无人机时间序列数据来解释/解释深度学习网络中的植物生长阶段,并使用具有注意力机制的后期融合模式预测地块水平的玉米籽粒产量。
    In both plant breeding and crop management, interpretability plays a crucial role in instilling trust in AI-driven approaches and enabling the provision of actionable insights. The primary objective of this research is to explore and evaluate the potential contributions of deep learning network architectures that employ stacked LSTM for end-of-season maize grain yield prediction. A secondary aim is to expand the capabilities of these networks by adapting them to better accommodate and leverage the multi-modality properties of remote sensing data. In this study, a multi-modal deep learning architecture that assimilates inputs from heterogeneous data streams, including high-resolution hyperspectral imagery, LiDAR point clouds, and environmental data, is proposed to forecast maize crop yields. The architecture includes attention mechanisms that assign varying levels of importance to different modalities and temporal features that, reflect the dynamics of plant growth and environmental interactions. The interpretability of the attention weights is investigated in multi-modal networks that seek to both improve predictions and attribute crop yield outcomes to genetic and environmental variables. This approach also contributes to increased interpretability of the model\'s predictions. The temporal attention weight distributions highlighted relevant factors and critical growth stages that contribute to the predictions. The results of this study affirm that the attention weights are consistent with recognized biological growth stages, thereby substantiating the network\'s capability to learn biologically interpretable features. Accuracies of the model\'s predictions of yield ranged from 0.82-0.93 R2 ref in this genetics-focused study, further highlighting the potential of attention-based models. Further, this research facilitates understanding of how multi-modality remote sensing aligns with the physiological stages of maize. The proposed architecture shows promise in improving predictions and offering interpretable insights into the factors affecting maize crop yields, while demonstrating the impact of data collection by different modalities through the growing season. By identifying relevant factors and critical growth stages, the model\'s attention weights provide valuable information that can be used in both plant breeding and crop management. The consistency of attention weights with biological growth stages reinforces the potential of deep learning networks in agricultural applications, particularly in leveraging remote sensing data for yield prediction. To the best of our knowledge, this is the first study that investigates the use of hyperspectral and LiDAR UAV time series data for explaining/interpreting plant growth stages within deep learning networks and forecasting plot-level maize grain yield using late fusion modalities with attention mechanisms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:使用双注意ResNet模型增强医学图像分类,并研究临床环境中注意机制对模型性能的影响。
    方法:我们利用了医学图像数据集,并实现了双注意ResNet模型,整合自我注意力和空间注意力机制。使用二元和五级质量分类任务对模型进行了训练和评估,利用标准评估指标。
    结果:我们的研究结果表明,双注意ResNet模型在两个分类任务中都有显著的性能改进。在二元分类任务中,该模型达到了0.940的精度,优于传统的ResNet模型。同样,在五级质量分类任务中,Dual-attentionResNet模型获得了0.757的精度,突出了其在捕获图像质量细微差异方面的功效。
    结论:ResNet模型中注意力机制的集成导致了显著的性能增强,展示其改进医学图像分类任务的潜力。这些结果强调了注意力机制在促进医学图像更准确和辨别分析方面的有希望的作用。因此,在放射学和诊断学的临床应用中具有实质性的希望。
    OBJECTIVE: To enhance medical image classification using a Dual-attention ResNet model and investigate the impact of attention mechanisms on model performance in a clinical setting.
    METHODS: We utilized a dataset of medical images and implemented a Dual-attention ResNet model, integrating self-attention and spatial attention mechanisms. The model was trained and evaluated using binary and five-level quality classification tasks, leveraging standard evaluation metrics.
    RESULTS: Our findings demonstrated substantial performance improvements with the Dual-attention ResNet model in both classification tasks. In the binary classification task, the model achieved an accuracy of 0.940, outperforming the conventional ResNet model. Similarly, in the five-level quality classification task, the Dual-attention ResNet model attained an accuracy of 0.757, highlighting its efficacy in capturing nuanced distinctions in image quality.
    CONCLUSIONS: The integration of attention mechanisms within the ResNet model resulted in significant performance enhancements, showcasing its potential for improving medical image classification tasks. These results underscore the promising role of attention mechanisms in facilitating more accurate and discriminative analysis of medical images, thus holding substantial promise for clinical applications in radiology and diagnostics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:经颅超声(TCS)在帕金森病的诊断中起着至关重要的作用。然而,TCS病理特征的复杂性,缺乏一致的诊断标准,对医生专业知识的依赖会阻碍准确的诊断。当前基于TCS的诊断方法,依赖于机器学习,通常涉及复杂的特征工程,并且可能难以捕获深层图像特征。虽然深度学习在图像处理方面具有优势,尚未针对特定的TCS和运动障碍考虑因素进行定制。因此,基于TCS的PD诊断的深度学习算法的研究很少。
    方法:本研究引入了深度学习残差网络模型,增强了注意力机制和多尺度特征提取,称为AMSNet,协助准确诊断。最初,实现了多尺度特征提取模块,以鲁棒地处理TCS图像中存在的不规则形态特征和显著区域信息。该模块有效地减轻了伪影和噪声的影响。当与卷积注意模块结合时,它增强了模型学习病变区域特征的能力。随后,剩余的网络架构,与频道注意力相结合,用于捕获图像中的分层和详细的纹理,进一步增强模型的特征表示能力。
    结果:该研究汇总了1109名参与者的TCS图像和个人数据。在该数据集上进行的实验表明,AMSNet取得了显著的分类准确率(92.79%),精度(95.42%),和特异性(93.1%)。它超越了以前在该领域采用的机器学习算法的性能,以及当前的通用深度学习模型。
    结论:本研究中提出的AMSNet偏离了需要复杂特征工程的传统机器学习方法。它能够自动提取和学习深度病理特征,并且有能力理解和表达复杂的数据。这强调了深度学习方法在应用TCS图像诊断运动障碍方面的巨大潜力。
    BACKGROUND: Transcranial sonography (TCS) plays a crucial role in diagnosing Parkinson\'s disease. However, the intricate nature of TCS pathological features, the lack of consistent diagnostic criteria, and the dependence on physicians\' expertise can hinder accurate diagnosis. Current TCS-based diagnostic methods, which rely on machine learning, often involve complex feature engineering and may struggle to capture deep image features. While deep learning offers advantages in image processing, it has not been tailored to address specific TCS and movement disorder considerations. Consequently, there is a scarcity of research on deep learning algorithms for TCS-based PD diagnosis.
    METHODS: This study introduces a deep learning residual network model, augmented with attention mechanisms and multi-scale feature extraction, termed AMSNet, to assist in accurate diagnosis. Initially, a multi-scale feature extraction module is implemented to robustly handle the irregular morphological features and significant area information present in TCS images. This module effectively mitigates the effects of artifacts and noise. When combined with a convolutional attention module, it enhances the model\'s ability to learn features of lesion areas. Subsequently, a residual network architecture, integrated with channel attention, is utilized to capture hierarchical and detailed textures within the images, further enhancing the model\'s feature representation capabilities.
    RESULTS: The study compiled TCS images and personal data from 1109 participants. Experiments conducted on this dataset demonstrated that AMSNet achieved remarkable classification accuracy (92.79%), precision (95.42%), and specificity (93.1%). It surpassed the performance of previously employed machine learning algorithms in this domain, as well as current general-purpose deep learning models.
    CONCLUSIONS: The AMSNet proposed in this study deviates from traditional machine learning approaches that necessitate intricate feature engineering. It is capable of automatically extracting and learning deep pathological features, and has the capacity to comprehend and articulate complex data. This underscores the substantial potential of deep learning methods in the application of TCS images for the diagnosis of movement disorders.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:准确诊断肺部疾病对于正确治疗至关重要。卷积神经网络(CNN)具有先进的医学图像处理,但挑战仍然存在于其准确的可解释性和可靠性。这项研究将U-Net与注意力和视觉变压器(VITs)相结合,以增强肺部疾病的分割和分类。我们假设AttentionU-Net将提高分割精度,而ViTs将提高分类性能。可解释性方法将阐明模型决策过程,帮助临床接受。方法:使用比较方法来评估使用胸部X射线对肺部疾病进行分割和分类的深度学习模型。注意力U-Net模型用于分割,以及由四个CNN和四个VET组成的架构进行了分类研究。梯度加权类激活映射加(Grad-CAM++)和逐层相关性传播(LRP)等方法通过识别影响模型决策的关键区域来提供可解释性。结果:结果支持以下结论:VITs在识别肺部疾病方面表现出色。注意U-Net获得的骰子系数为98.54%,Jaccard指数为97.12%。ViTs在分类任务中的表现优于CNN9.26%,使用MobileViT达到98.52%的精度。在从原始数据分类移动到分段图像分类的过程中,可以看到准确性提高了8.3%。像Grad-CAM++和LRP这样的技术提供了对模型的决策过程的见解。结论:这项研究强调了整合注意力U-Net和VITs对分析肺部疾病的好处,证明了它们在临床环境中的重要性。强调可解释性澄清了深度学习过程,增强对人工智能解决方案的信心,也许还能提高临床接受度,以改善医疗保健结果。
    Background: Diagnosing lung diseases accurately is crucial for proper treatment. Convolutional neural networks (CNNs) have advanced medical image processing, but challenges remain in their accurate explainability and reliability. This study combines U-Net with attention and Vision Transformers (ViTs) to enhance lung disease segmentation and classification. We hypothesize that Attention U-Net will enhance segmentation accuracy and that ViTs will improve classification performance. The explainability methodologies will shed light on model decision-making processes, aiding in clinical acceptance. Methodology: A comparative approach was used to evaluate deep learning models for segmenting and classifying lung illnesses using chest X-rays. The Attention U-Net model is used for segmentation, and architectures consisting of four CNNs and four ViTs were investigated for classification. Methods like Gradient-weighted Class Activation Mapping plus plus (Grad-CAM++) and Layer-wise Relevance Propagation (LRP) provide explainability by identifying crucial areas influencing model decisions. Results: The results support the conclusion that ViTs are outstanding in identifying lung disorders. Attention U-Net obtained a Dice Coefficient of 98.54% and a Jaccard Index of 97.12%. ViTs outperformed CNNs in classification tasks by 9.26%, reaching an accuracy of 98.52% with MobileViT. An 8.3% increase in accuracy was seen while moving from raw data classification to segmented image classification. Techniques like Grad-CAM++ and LRP provided insights into the decision-making processes of the models. Conclusions: This study highlights the benefits of integrating Attention U-Net and ViTs for analyzing lung diseases, demonstrating their importance in clinical settings. Emphasizing explainability clarifies deep learning processes, enhancing confidence in AI solutions and perhaps enhancing clinical acceptance for improved healthcare results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    变色甜瓜是一种观赏和食用的水果。针对智能农业装备中变色瓜检测速度慢、部署成本高的问题,本研究提出了一种轻量级检测模型YOLOv8-CML。首先,引入了轻量级的Faster-Block,以减少内存访问次数,同时减少冗余计算,并且获得更轻的C2f结构。然后,在Backbone中构建了轻量级的C2f模块融合EMA模块,以更有效地采集多尺度空间信息,减少复杂背景对识别效果的干扰。接下来,利用共享参数的思想重新设计检测头,进一步简化模型。最后,采用α-IoU损失函数更好地利用α超参数来测量预测帧和真实帧之间的重叠,提高识别精度。实验结果表明,与YOLOv8n模型相比,改进的YOLOv8-CML模型的参数和计算比率分别下降了42.9%和51.8%,分别。此外,型号尺寸仅为3.7MB,推理速度提高了6.9%,而mAP@0.5,精度,FPS也得到了改善。我们提出的模型为部署变色甜瓜采摘机器人提供了重要的参考。
    Color-changing melon is an ornamental and edible fruit. Aiming at the problems of slow detection speed and high deployment cost for Color-changing melon in intelligent agriculture equipment, this study proposes a lightweight detection model YOLOv8-CML.Firstly, a lightweight Faster-Block is introduced to reduce the number of memory accesses while reducing redundant computation, and a lighter C2f structure is obtained. Then, the lightweight C2f module fusing EMA module is constructed in Backbone to collect multi-scale spatial information more efficiently and reduce the interference of complex background on the recognition effect. Next, the idea of shared parameters is utilized to redesign the detection head to simplify the model further. Finally, the α-IoU loss function is adopted better to measure the overlap between the predicted and real frames using the α hyperparameter, improving the recognition accuracy. The experimental results show that compared to the YOLOv8n model, the parametric and computational ratios of the improved YOLOv8-CML model decreased by 42.9% and 51.8%, respectively. In addition, the model size is only 3.7 MB, and the inference speed is improved by 6.9%, while mAP@0.5, accuracy, and FPS are also improved. Our proposed model provides a vital reference for deploying Color-changing melon picking robots.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号