attention mechanisms

  • 文章类型: Journal Article
    本研究旨在探索利用深度学习技术对排球训练视频进行分类和描述的方法。通过开发集成双向长短期记忆(BiLSTM)和注意力机制的创新模型,参考BiLSTM-多模态注意融合时间分类(BiLSTM-MAFTC),提高了排球视频内容分析的准确性和效率。最初,该模型将来自各种模态的特征编码为特征向量,捕获不同类型的信息,如位置和模态数据。然后使用BiLSTM网络对多模态时间信息进行建模,而空间和渠道注意力机制被纳入以形成双重注意力模块。该模块建立不同模态特征之间的相关性,从每种模态中提取有价值的信息,并发现跨模态的互补信息。大量实验验证了该方法的有效性和最先进的性能。与传统的递归神经网络算法相比,在动作识别的Top-1和Top-5度量下,该模型的识别准确率超过95%,每个视频的识别速度为0.04s。研究表明,该模型能够有效地处理和分析多模态时态信息,包括运动员的动作,在法庭上的位置关系,和球的轨迹。因此,实现了排球训练视频的精确分类和描述。这种进步大大提高了教练员和运动员在排球训练中的效率,并为更广泛的体育视频分析研究提供了宝贵的见解。
    This study aims to explore methods for classifying and describing volleyball training videos using deep learning techniques. By developing an innovative model that integrates Bi-directional Long Short-Term Memory (BiLSTM) and attention mechanisms, referred to BiLSTM-Multimodal Attention Fusion Temporal Classification (BiLSTM-MAFTC), the study enhances the accuracy and efficiency of volleyball video content analysis. Initially, the model encodes features from various modalities into feature vectors, capturing different types of information such as positional and modal data. The BiLSTM network is then used to model multi-modal temporal information, while spatial and channel attention mechanisms are incorporated to form a dual-attention module. This module establishes correlations between different modality features, extracting valuable information from each modality and uncovering complementary information across modalities. Extensive experiments validate the method\'s effectiveness and state-of-the-art performance. Compared to conventional recurrent neural network algorithms, the model achieves recognition accuracies exceeding 95 % under Top-1 and Top-5 metrics for action recognition, with a recognition speed of 0.04 s per video. The study demonstrates that the model can effectively process and analyze multimodal temporal information, including athlete movements, positional relationships on the court, and ball trajectories. Consequently, precise classification and description of volleyball training videos are achieved. This advancement significantly enhances the efficiency of coaches and athletes in volleyball training and provides valuable insights for broader sports video analysis research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    柠檬,作为一种具有丰富营养价值的重要经济作物,在全球范围内具有重要的种植重要性和市场需求。然而,柠檬病害严重影响柠檬的品质和产量,必须及早发现以进行有效控制。本文通过收集柠檬疾病的数据集来满足这一需求,由在不同光照水平下拍摄的726张图像组成,生长阶段,射击距离和疾病状况。通过裁剪高分辨率图像,数据集扩展到2022年的图像,包括4441只健康柠檬和718只患病柠檬,每个图像大约有1-6个目标。然后,我们提出了一种新的模型柠檬表面病YOLO(LSD-YOLO),集成了可切换Atrous卷积(SAConv)和卷积块注意力模块(CBAM),同时设计了C2f-SAC,增加了小目标探测层,增强了关键特征的提取和不同尺度特征的融合。实验结果表明,所提出的LSD-YOLO在收集的数据集上达到了90.62%的精度,MAP@50-95达到80.84%。与原来的YOLOv8n型号相比,mAP@50和mAP@50-95指标都得到了增强。因此,本研究中提出的LSD-YOLO模型提供了对健康和患病柠檬的更准确识别,有助于有效解决柠檬病检测问题。
    Lemon, as an important cash crop with rich nutritional value, holds significant cultivation importance and market demand worldwide. However, lemon diseases seriously impact the quality and yield of lemons, necessitating their early detection for effective control. This paper addresses this need by collecting a dataset of lemon diseases, consisting of 726 images captured under varying light levels, growth stages, shooting distances and disease conditions. Through cropping high-resolution images, the dataset is expanded to 2022 images, comprising 4441 healthy lemons and 718 diseased lemons, with approximately 1-6 targets per image. Then, we propose a novel model lemon surface disease YOLO (LSD-YOLO), which integrates Switchable Atrous Convolution (SAConv) and Convolutional Block Attention Module (CBAM), along with the design of C2f-SAC and the addition of a small-target detection layer to enhance the extraction of key features and the fusion of features at different scales. The experimental results demonstrate that the proposed LSD-YOLO achieves an accuracy of 90.62% on the collected datasets, with mAP@50-95 reaching 80.84%. Compared with the original YOLOv8n model, both mAP@50 and mAP@50-95 metrics are enhanced. Therefore, the LSD-YOLO model proposed in this study provides a more accurate recognition of healthy and diseased lemons, contributing effectively to solving the lemon disease detection problem.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在植物育种和作物管理中,可解释性在灌输对人工智能驱动方法的信任和提供可操作的见解方面起着至关重要的作用。这项研究的主要目的是探索和评估使用堆叠LSTM进行季末玉米籽粒产量预测的深度学习网络体系结构的潜在贡献。第二个目标是通过调整这些网络以更好地适应和利用遥感数据的多模态属性来扩展这些网络的能力。在这项研究中,一种从异构数据流中吸收输入的多模态深度学习架构,包括高分辨率的高光谱图像,激光雷达点云,和环境数据,建议预测玉米作物产量。该架构包括注意力机制,这些机制将不同级别的重要性分配给不同的模态和时间特征,反映了植物生长和环境相互作用的动态。在多模式网络中研究了注意力权重的可解释性,该网络旨在改善预测并将作物产量结果归因于遗传和环境变量。这种方法还有助于增加模型预测的可解释性。时间注意力权重分布突出了有助于预测的相关因素和关键增长阶段。这项研究的结果确认,注意权重与公认的生物生长阶段一致,从而证实了网络学习生物学可解释特征的能力。在这项以遗传学为重点的研究中,模型预测产量的准确性范围为0.82-0.93R2ref,进一步突出了基于注意力的模型的潜力。Further,这项研究有助于理解多模态遥感如何与玉米的生理阶段保持一致。拟议的架构显示了改善预测和提供可解释的见解影响玉米作物产量的因素的希望,同时证明了通过不同方式收集数据对整个生长季节的影响。通过确定相关因素和关键生长阶段,该模型的注意力权重提供了有价值的信息,可用于植物育种和作物管理。注意力权重与生物生长阶段的一致性增强了深度学习网络在农业应用中的潜力。特别是在利用遥感数据进行产量预测方面。据我们所知,这是第一项研究使用高光谱和LiDAR无人机时间序列数据来解释/解释深度学习网络中的植物生长阶段,并使用具有注意力机制的后期融合模式预测地块水平的玉米籽粒产量。
    In both plant breeding and crop management, interpretability plays a crucial role in instilling trust in AI-driven approaches and enabling the provision of actionable insights. The primary objective of this research is to explore and evaluate the potential contributions of deep learning network architectures that employ stacked LSTM for end-of-season maize grain yield prediction. A secondary aim is to expand the capabilities of these networks by adapting them to better accommodate and leverage the multi-modality properties of remote sensing data. In this study, a multi-modal deep learning architecture that assimilates inputs from heterogeneous data streams, including high-resolution hyperspectral imagery, LiDAR point clouds, and environmental data, is proposed to forecast maize crop yields. The architecture includes attention mechanisms that assign varying levels of importance to different modalities and temporal features that, reflect the dynamics of plant growth and environmental interactions. The interpretability of the attention weights is investigated in multi-modal networks that seek to both improve predictions and attribute crop yield outcomes to genetic and environmental variables. This approach also contributes to increased interpretability of the model\'s predictions. The temporal attention weight distributions highlighted relevant factors and critical growth stages that contribute to the predictions. The results of this study affirm that the attention weights are consistent with recognized biological growth stages, thereby substantiating the network\'s capability to learn biologically interpretable features. Accuracies of the model\'s predictions of yield ranged from 0.82-0.93 R2 ref in this genetics-focused study, further highlighting the potential of attention-based models. Further, this research facilitates understanding of how multi-modality remote sensing aligns with the physiological stages of maize. The proposed architecture shows promise in improving predictions and offering interpretable insights into the factors affecting maize crop yields, while demonstrating the impact of data collection by different modalities through the growing season. By identifying relevant factors and critical growth stages, the model\'s attention weights provide valuable information that can be used in both plant breeding and crop management. The consistency of attention weights with biological growth stages reinforces the potential of deep learning networks in agricultural applications, particularly in leveraging remote sensing data for yield prediction. To the best of our knowledge, this is the first study that investigates the use of hyperspectral and LiDAR UAV time series data for explaining/interpreting plant growth stages within deep learning networks and forecasting plot-level maize grain yield using late fusion modalities with attention mechanisms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:使用双注意ResNet模型增强医学图像分类,并研究临床环境中注意机制对模型性能的影响。
    方法:我们利用了医学图像数据集,并实现了双注意ResNet模型,整合自我注意力和空间注意力机制。使用二元和五级质量分类任务对模型进行了训练和评估,利用标准评估指标。
    结果:我们的研究结果表明,双注意ResNet模型在两个分类任务中都有显著的性能改进。在二元分类任务中,该模型达到了0.940的精度,优于传统的ResNet模型。同样,在五级质量分类任务中,Dual-attentionResNet模型获得了0.757的精度,突出了其在捕获图像质量细微差异方面的功效。
    结论:ResNet模型中注意力机制的集成导致了显著的性能增强,展示其改进医学图像分类任务的潜力。这些结果强调了注意力机制在促进医学图像更准确和辨别分析方面的有希望的作用。因此,在放射学和诊断学的临床应用中具有实质性的希望。
    OBJECTIVE: To enhance medical image classification using a Dual-attention ResNet model and investigate the impact of attention mechanisms on model performance in a clinical setting.
    METHODS: We utilized a dataset of medical images and implemented a Dual-attention ResNet model, integrating self-attention and spatial attention mechanisms. The model was trained and evaluated using binary and five-level quality classification tasks, leveraging standard evaluation metrics.
    RESULTS: Our findings demonstrated substantial performance improvements with the Dual-attention ResNet model in both classification tasks. In the binary classification task, the model achieved an accuracy of 0.940, outperforming the conventional ResNet model. Similarly, in the five-level quality classification task, the Dual-attention ResNet model attained an accuracy of 0.757, highlighting its efficacy in capturing nuanced distinctions in image quality.
    CONCLUSIONS: The integration of attention mechanisms within the ResNet model resulted in significant performance enhancements, showcasing its potential for improving medical image classification tasks. These results underscore the promising role of attention mechanisms in facilitating more accurate and discriminative analysis of medical images, thus holding substantial promise for clinical applications in radiology and diagnostics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    生存分析用于仔细检查事件发生时间数据,重点是理解特定事件发生之前的持续时间。在这篇文章中,我们介绍了两种新的生存预测模型:CosAttnSurv和CosAttnSurvDyACT。CosAttnSurv模型利用基于变压器的体系结构和无softmax的内核注意力机制进行生存预测。我们的第二个模型,CosAttnSurv+DyACT,利用动态自适应计算时间(DyACT)控制增强CosAttnSurv,优化计算效率。使用与心脏病患者相关的两个公共临床数据集来验证所提出的模型。与其他最先进的模型相比,我们的模型显示出增强的辨别和校准性能.此外,与其他基于变压器架构的模型相比,我们提出的模型表现出相当的性能,同时表现出显著减少的时间和内存需求。总的来说,我们的模型在生存分析领域提供了重大进展,并强调了计算有效的基于时间的预测的重要性,对医疗决策和病人护理有希望的影响。
    Survival analysis is employed to scrutinize time-to-event data, with emphasis on comprehending the duration until the occurrence of a specific event. In this article, we introduce two novel survival prediction models: CosAttnSurv and CosAttnSurv + DyACT. CosAttnSurv model leverages transformer-based architecture and a softmax-free kernel attention mechanism for survival prediction. Our second model, CosAttnSurv + DyACT, enhances CosAttnSurv with Dynamic Adaptive Computation Time (DyACT) control, optimizing computation efficiency. The proposed models are validated using two public clinical datasets related to heart disease patients. When compared to other state-of-the-art models, our models demonstrated an enhanced discriminative and calibration performance. Furthermore, in comparison to other transformer architecture-based models, our proposed models demonstrate comparable performance while exhibiting significant reduction in both time and memory requirements. Overall, our models offer significant advancements in the field of survival analysis and emphasize the importance of computationally effective time-based predictions, with promising implications for medical decision-making and patient care.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:经颅超声(TCS)在帕金森病的诊断中起着至关重要的作用。然而,TCS病理特征的复杂性,缺乏一致的诊断标准,对医生专业知识的依赖会阻碍准确的诊断。当前基于TCS的诊断方法,依赖于机器学习,通常涉及复杂的特征工程,并且可能难以捕获深层图像特征。虽然深度学习在图像处理方面具有优势,尚未针对特定的TCS和运动障碍考虑因素进行定制。因此,基于TCS的PD诊断的深度学习算法的研究很少。
    方法:本研究引入了深度学习残差网络模型,增强了注意力机制和多尺度特征提取,称为AMSNet,协助准确诊断。最初,实现了多尺度特征提取模块,以鲁棒地处理TCS图像中存在的不规则形态特征和显著区域信息。该模块有效地减轻了伪影和噪声的影响。当与卷积注意模块结合时,它增强了模型学习病变区域特征的能力。随后,剩余的网络架构,与频道注意力相结合,用于捕获图像中的分层和详细的纹理,进一步增强模型的特征表示能力。
    结果:该研究汇总了1109名参与者的TCS图像和个人数据。在该数据集上进行的实验表明,AMSNet取得了显著的分类准确率(92.79%),精度(95.42%),和特异性(93.1%)。它超越了以前在该领域采用的机器学习算法的性能,以及当前的通用深度学习模型。
    结论:本研究中提出的AMSNet偏离了需要复杂特征工程的传统机器学习方法。它能够自动提取和学习深度病理特征,并且有能力理解和表达复杂的数据。这强调了深度学习方法在应用TCS图像诊断运动障碍方面的巨大潜力。
    BACKGROUND: Transcranial sonography (TCS) plays a crucial role in diagnosing Parkinson\'s disease. However, the intricate nature of TCS pathological features, the lack of consistent diagnostic criteria, and the dependence on physicians\' expertise can hinder accurate diagnosis. Current TCS-based diagnostic methods, which rely on machine learning, often involve complex feature engineering and may struggle to capture deep image features. While deep learning offers advantages in image processing, it has not been tailored to address specific TCS and movement disorder considerations. Consequently, there is a scarcity of research on deep learning algorithms for TCS-based PD diagnosis.
    METHODS: This study introduces a deep learning residual network model, augmented with attention mechanisms and multi-scale feature extraction, termed AMSNet, to assist in accurate diagnosis. Initially, a multi-scale feature extraction module is implemented to robustly handle the irregular morphological features and significant area information present in TCS images. This module effectively mitigates the effects of artifacts and noise. When combined with a convolutional attention module, it enhances the model\'s ability to learn features of lesion areas. Subsequently, a residual network architecture, integrated with channel attention, is utilized to capture hierarchical and detailed textures within the images, further enhancing the model\'s feature representation capabilities.
    RESULTS: The study compiled TCS images and personal data from 1109 participants. Experiments conducted on this dataset demonstrated that AMSNet achieved remarkable classification accuracy (92.79%), precision (95.42%), and specificity (93.1%). It surpassed the performance of previously employed machine learning algorithms in this domain, as well as current general-purpose deep learning models.
    CONCLUSIONS: The AMSNet proposed in this study deviates from traditional machine learning approaches that necessitate intricate feature engineering. It is capable of automatically extracting and learning deep pathological features, and has the capacity to comprehend and articulate complex data. This underscores the substantial potential of deep learning methods in the application of TCS images for the diagnosis of movement disorders.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:准确诊断肺部疾病对于正确治疗至关重要。卷积神经网络(CNN)具有先进的医学图像处理,但挑战仍然存在于其准确的可解释性和可靠性。这项研究将U-Net与注意力和视觉变压器(VITs)相结合,以增强肺部疾病的分割和分类。我们假设AttentionU-Net将提高分割精度,而ViTs将提高分类性能。可解释性方法将阐明模型决策过程,帮助临床接受。方法:使用比较方法来评估使用胸部X射线对肺部疾病进行分割和分类的深度学习模型。注意力U-Net模型用于分割,以及由四个CNN和四个VET组成的架构进行了分类研究。梯度加权类激活映射加(Grad-CAM++)和逐层相关性传播(LRP)等方法通过识别影响模型决策的关键区域来提供可解释性。结果:结果支持以下结论:VITs在识别肺部疾病方面表现出色。注意U-Net获得的骰子系数为98.54%,Jaccard指数为97.12%。ViTs在分类任务中的表现优于CNN9.26%,使用MobileViT达到98.52%的精度。在从原始数据分类移动到分段图像分类的过程中,可以看到准确性提高了8.3%。像Grad-CAM++和LRP这样的技术提供了对模型的决策过程的见解。结论:这项研究强调了整合注意力U-Net和VITs对分析肺部疾病的好处,证明了它们在临床环境中的重要性。强调可解释性澄清了深度学习过程,增强对人工智能解决方案的信心,也许还能提高临床接受度,以改善医疗保健结果。
    Background: Diagnosing lung diseases accurately is crucial for proper treatment. Convolutional neural networks (CNNs) have advanced medical image processing, but challenges remain in their accurate explainability and reliability. This study combines U-Net with attention and Vision Transformers (ViTs) to enhance lung disease segmentation and classification. We hypothesize that Attention U-Net will enhance segmentation accuracy and that ViTs will improve classification performance. The explainability methodologies will shed light on model decision-making processes, aiding in clinical acceptance. Methodology: A comparative approach was used to evaluate deep learning models for segmenting and classifying lung illnesses using chest X-rays. The Attention U-Net model is used for segmentation, and architectures consisting of four CNNs and four ViTs were investigated for classification. Methods like Gradient-weighted Class Activation Mapping plus plus (Grad-CAM++) and Layer-wise Relevance Propagation (LRP) provide explainability by identifying crucial areas influencing model decisions. Results: The results support the conclusion that ViTs are outstanding in identifying lung disorders. Attention U-Net obtained a Dice Coefficient of 98.54% and a Jaccard Index of 97.12%. ViTs outperformed CNNs in classification tasks by 9.26%, reaching an accuracy of 98.52% with MobileViT. An 8.3% increase in accuracy was seen while moving from raw data classification to segmented image classification. Techniques like Grad-CAM++ and LRP provided insights into the decision-making processes of the models. Conclusions: This study highlights the benefits of integrating Attention U-Net and ViTs for analyzing lung diseases, demonstrating their importance in clinical settings. Emphasizing explainability clarifies deep learning processes, enhancing confidence in AI solutions and perhaps enhancing clinical acceptance for improved healthcare results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:在脑机接口(BCI)领域,从脑电图(EEG)数据中识别情绪是一项困难的工作,因为数据量大,信号的复杂性,以及构成信号的几个频道。
    方法:使用双流结构缩放和多注意力机制(LDMGEEG),提供了一个轻量级的网络,以最大限度地提高基于EEG的情绪识别的准确性和性能。目的是减少计算参数的数量,同时保持当前的分类精度水平。该网络采用对称双流架构来分别评估使用EEG信号的差分熵特征作为输入构建的时域和频域时空图。
    结果:实验结果表明,在显着降低参数数量后,该模型在现场实现了最佳性能,在SEED数据集上的准确率为95.18%。
    方法:此外,与现有模型相比,它将参数数量减少了98%。
    结论:所提出的方法不同的信道时间/频率空间多注意和后注意方法增强了模型聚集特征的能力,并导致轻量级性能。
    BACKGROUND: In the realm of brain-computer interfaces (BCI), identifying emotions from electroencephalogram (EEG) data is a difficult endeavor because of the volume of data, the intricacy of the signals, and the several channels that make up the signals.
    METHODS: Using dual-stream structure scaling and multiple attention mechanisms (LDMGEEG), a lightweight network is provided to maximize the accuracy and performance of EEG-based emotion identification. Reducing the number of computational parameters while maintaining the current level of classification accuracy is the aim. This network employs a symmetric dual-stream architecture to assess separately time-domain and frequency-domain spatio-temporal maps constructed using differential entropy features of EEG signals as inputs.
    RESULTS: The experimental results show that after significantly lowering the number of parameters, the model achieved the best possible performance in the field, with a 95.18 % accuracy on the SEED dataset.
    METHODS: Moreover, it reduced the number of parameters by 98 % when compared to existing models.
    CONCLUSIONS: The proposed method distinct channel-time/frequency-space multiple attention and post-attention methods enhance the model\'s ability to aggregate features and result in lightweight performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:脊柱X射线图像中椎骨的自动分割对于临床诊断至关重要,案例分析,和脊柱病变的手术计划。
    目标:然而,由于X射线图像的固有特性,包括低对比度,高噪音,和不均匀的灰度,在计算机辅助脊柱图像分析和疾病诊断应用中,它仍然是一个关键和具有挑战性的问题。
    方法:在本文中,多尺度特征增强网络(MFENet),被提议用于分割整个脊柱X射线图像,帮助医生诊断脊柱相关疾病。为了增强特征提取,该网络包含双分支特征提取模块(DFEM)和语义聚合模块(SAM)。DFEM具有并行双分支结构。上分支利用多尺度卷积内核从图像中提取特征。采用不同大小的卷积内核有助于捕获不同尺度的细节和结构信息。下部分支包含注意力机制以进一步优化特征表示。通过对空间和跨通道的特征图进行建模,网络变得更加专注于关键特征区域,并抑制与任务无关的信息。SAM利用上下文语义信息来补偿在池化和卷积运算期间丢失的细节。它集成了来自不同尺度的高级特征信息,以减少分割结果的不连续性。此外,采用混合损失函数来增强网络的特征提取能力。
    结果:在这项研究中,我们利用河南省人民医院脊柱外科提供的数据集进行了大量实验。实验结果表明,与其他先进方法相比,我们提出的MFENet在X射线图像上的脊柱分割中显示出更好的分割性能。MIOU达到92.61±0.431,DSC为92.42±0.329,全局精度为99.51±0.037。
    结论:我们的模型能够更有效地学习和提取全局上下文语义信息,显著改善脊柱分割性能,进一步帮助医生分析病人的情况。
    BACKGROUND: Automatic segmentation of vertebrae in spinal x-ray images is crucial for clinical diagnosis, case analysis, and surgical planning of spinal lesions.
    OBJECTIVE: However, due to the inherent characteristics of x-ray images, including low contrast, high noise, and uneven grey scale, it remains a critical and challenging problem in computer-aided spine image analysis and disease diagnosis applications.
    METHODS: In this paper, a Multiscale Feature Enhancement Network (MFENet), is proposed for segmenting whole spinal x-ray images, to aid doctors in diagnosing spinal-related diseases. To enhance feature extraction, the network incorporates a Dual-branch Feature Extraction Module (DFEM) and a Semantic Aggregation Module (SAM). The DFEM has a parallel dual-branch structure. The upper branch utilizes multiscale convolutional kernels to extract features from images. Employing convolutional kernels of different sizes helps capture details and structural information at different scales. The lower branch incorporates attention mechanisms to further optimize feature representation. By modeling the feature maps spatially and across channels, the network becomes more focused on key feature regions and suppresses task-irrelevant information. The SAM leverages contextual semantic information to compensate for details lost during pooling and convolution operations. It integrates high-level feature information from different scales to reduce segmentation result discontinuity. In addition, a hybrid loss function is employed to enhance the network\'s feature extraction capability.
    RESULTS: In this study, we conducted a multitude of experiments utilizing dataset provided by the Spine Surgery Department of Henan Provincial People\'s Hospital. The experimental results indicate that our proposed MFENet demonstrates superior segmentation performance in spinal segmentation on x-ray images compared to other advanced methods, achieving 92.61 ± 0.431 for MIoU, 92.42 ± 0.329 for DSC, and 99.51 ± 0.037 for Global_accuracy.
    CONCLUSIONS: Our model is able to more effectively learn and extract global contextual semantic information, significantly improving spinal segmentation performance, further aiding doctors in analyzing patient conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    乳腺癌是女性最常见的癌症。超声是最常用的诊断技术之一,但是该领域的专家需要解释该测试。计算机辅助诊断(CAD)系统旨在在此过程中帮助医生。专家使用乳腺成像报告和数据系统(BI-RADS)根据几个特征(形状,margin,定位。..)并估计它们的恶性程度,用一种共同的语言。为了通过BI-RADS解释来帮助肿瘤诊断,本文提出了一种用于肿瘤检测的深度神经网络,描述,和分类。一位放射科专家用BI-RADS术语描述了从公共数据集中获取的749个结节。YOLO检测算法用于获得感兴趣区域(ROI),然后是一个模型,基于多类分类架构,接收每个ROI作为输入,并输出BI-RADS描述符,BI-RADS分类(有6个类别),和恶性肿瘤的布尔分类。600个结节用于10倍交叉验证(CV),149个用于测试。将该模型的准确性与同一任务的最新CNN进行了比较。该模型在与专家(科恩的kappa)的协议中优于普通分类器,在CV和测试中,描述符的平均值为0.58,在测试中为0.64,而第二好的模型产生的kappas分别为0.55和0.59。将YOLO添加到模型中可显著增强性能(在CV中为0.16,在测试中为0.09)。更重要的是,使用BI-RADS描述符训练模型可以在不降低准确性的情况下实现布尔恶性肿瘤分类的可解释性。
    Breast cancer is the most common cancer in women. Ultrasound is one of the most used techniques for diagnosis, but an expert in the field is necessary to interpret the test. Computer-aided diagnosis (CAD) systems aim to help physicians during this process. Experts use the Breast Imaging-Reporting and Data System (BI-RADS) to describe tumors according to several features (shape, margin, orientation...) and estimate their malignancy, with a common language. To aid in tumor diagnosis with BI-RADS explanations, this paper presents a deep neural network for tumor detection, description, and classification. An expert radiologist described with BI-RADS terms 749 nodules taken from public datasets. The YOLO detection algorithm is used to obtain Regions of Interest (ROIs), and then a model, based on a multi-class classification architecture, receives as input each ROI and outputs the BI-RADS descriptors, the BI-RADS classification (with 6 categories), and a Boolean classification of malignancy. Six hundred of the nodules were used for 10-fold cross-validation (CV) and 149 for testing. The accuracy of this model was compared with state-of-the-art CNNs for the same task. This model outperforms plain classifiers in the agreement with the expert (Cohen\'s kappa), with a mean over the descriptors of 0.58 in CV and 0.64 in testing, while the second best model yielded kappas of 0.55 and 0.59, respectively. Adding YOLO to the model significantly enhances the performance (0.16 in CV and 0.09 in testing). More importantly, training the model with BI-RADS descriptors enables the explainability of the Boolean malignancy classification without reducing accuracy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号