medical image

  • 文章类型: Journal Article
    Colorectal cancer (CRC) is a common malignant tumor that seriously threatens human health. CRC presents a formidable challenge in terms of accurate identification due to its indistinct boundaries. With the widespread adoption of convolutional neural networks (CNNs) in image processing, leveraging CNNs for automatic classification and segmentation holds immense potential for enhancing the efficiency of colorectal cancer recognition and reducing treatment costs. This paper explores the imperative necessity for applying CNNs in clinical diagnosis of CRC. It provides an elaborate overview on research advancements pertaining to CNNs and their improved models in CRC classification and segmentation. Furthermore, this work summarizes the ideas and common methods for optimizing network performance and discusses the challenges faced by CNNs as well as future development trends in their application towards CRC classification and segmentation, thereby promoting their utilization within clinical diagnosis.






  • 文章类型: Journal Article
    This paper introduces the efficient medical-images-aimed segment anything model (EMedSAM), addressing the high computational demands and limited adaptability of using SAM for medical image segmentation tasks. We present a novel, compact image encoder, DD-TinyViT, designed to enhance segmentation efficiency through an innovative parameter tuning method called med-adapter. The lightweight DD-TinyViT encoder is derived from the well-known ViT-H using a decoupled distillation approach.The segmentation and recognition capabilities of EMedSAM for specific structures are improved by med-adapter, which dynamically adjusts the model parameters specifically for medical imaging. We conducted extensive testing on EMedSAM using the public FLARE 2022 dataset and datasets from the First Hospital of Zhejiang University School of Medicine. The results demonstrate that our model outperforms existing state-of-the-art models in both multi-organ and lung segmentation tasks.






  • 文章类型: Journal Article
    In clinical practice, the anatomical classification of pulmonary veins plays a crucial role in the preoperative assessment of atrial fibrillation radiofrequency ablation surgery. Accurate classification of pulmonary vein anatomy assists physicians in selecting appropriate mapping electrodes and avoids causing pulmonary arterial hypertension. Due to the diverse and subtly different anatomical classifications of pulmonary veins, as well as the imbalance in data distribution, deep learning models often exhibit poor expression capability in extracting deep features, leading to misjudgments and affecting classification accuracy. Therefore, in order to solve the problem of unbalanced classification of left atrial pulmonary veins, this paper proposes a network integrating multi-scale feature-enhanced attention and dual-feature extraction classifiers, called DECNet. The multi-scale feature-enhanced attention utilizes multi-scale information to guide the reinforcement of deep features, generating channel weights and spatial weights to enhance the expression capability of deep features. The dual-feature extraction classifier assigns a fixed number of channels to each category, equally evaluating all categories, thus alleviating the learning bias and overfitting caused by data imbalance. By combining the two, the expression capability of deep features is strengthened, achieving accurate classification of left atrial pulmonary vein morphology and providing support for subsequent clinical treatment. The proposed method is evaluated on datasets provided by the People\'s Hospital of Liaoning Province and the publicly available DermaMNIST dataset, achieving average accuracies of 78.81% and 83.44%, respectively, demonstrating the effectiveness of the proposed approach.






  • 文章类型: Journal Article
    The application of magnetic resonance imaging (MRI) in the classification of brain tumors is constrained by the complex and time-consuming characteristics of traditional diagnostics procedures, mainly because of the need for a thorough assessment across several regions. Nevertheless, advancements in deep learning (DL) have facilitated the development of an automated system that improves the identification and assessment of medical images, effectively addressing these difficulties. Convolutional neural networks (CNNs) have emerged as steadfast tools for image classification and visual perception. This study introduces an innovative approach that combines CNNs with a hybrid attention mechanism to classify primary brain tumors, including glioma, meningioma, pituitary, and no-tumor cases. The proposed algorithm was rigorously tested with benchmark data from well-documented sources in the literature. It was evaluated alongside established pre-trained models such as Xception, ResNet50V2, Densenet201, ResNet101V2, and DenseNet169. The performance metrics of the proposed method were remarkable, demonstrating classification accuracy of 98.33%, precision and recall of 98.30%, and F1-score of 98.20%. The experimental finding highlights the superior performance of the new approach in identifying the most frequent types of brain tumors. Furthermore, the method shows excellent generalization capabilities, making it an invaluable tool for healthcare in diagnosing brain conditions accurately and efficiently.






  • 文章类型: Journal Article
    UNASSIGNED: Soft tissue sarcomas, similar in incidence to cervical and esophageal cancers, arise from various soft tissues like smooth muscle, fat, and fibrous tissue. Effective segmentation of sarcomas in imaging is crucial for accurate diagnosis.
    UNASSIGNED: This study collected multi-modal MRI images from 45 patients with thigh soft tissue sarcoma, totaling 8,640 images. These images were annotated by clinicians to delineate the sarcoma regions, creating a comprehensive dataset. We developed a novel segmentation model based on the UNet framework, enhanced with residual networks and attention mechanisms for improved modality-specific information extraction. Additionally, self-supervised learning strategies were employed to optimize feature extraction capabilities of the encoders.
    UNASSIGNED: The new model demonstrated superior segmentation performance when using multi-modal MRI images compared to single-modal inputs. The effectiveness of the model in utilizing the created dataset was validated through various experimental setups, confirming the enhanced ability to characterize tumor regions across different modalities.
    UNASSIGNED: The integration of multi-modal MRI images and advanced machine learning techniques in our model significantly improves the segmentation of soft tissue sarcomas in thigh imaging. This advancement aids clinicians in better diagnosing and understanding the patient\'s condition, leveraging the strengths of different imaging modalities. Further studies could explore the application of these techniques to other types of soft tissue sarcomas and additional anatomical sites.






  • 文章类型: English Abstract
    Objective:To build a VGG-based computer-aided diagnostic model for chronic sinusitis and evaluate its efficacy. Methods:①A total of 5 000 frames of diagnosed sinus CT images were collected. The normal group consisted of 1 000 frames(250 frames each of maxillary sinus, frontal sinus, septal sinus, and pterygoid sinus), while the abnormal group consisted of 4 000 frames(1 000 frames each of maxillary sinusitis, frontal sinusitis, septal sinusitis, and pterygoid sinusitis). ②The models were trained and simulated to obtain five classification models for the normal group, the pteroid sinusitis group, the frontal sinusitis group, the septal sinusitis group and the maxillary sinusitis group, respectively. The classification efficacy of the models was evaluated objectively in six dimensions: accuracy, precision, sensitivity, specificity, interpretation time and area under the ROC curve(AUC). ③Two hundred randomly selected images were read by the model with three groups of physicians(low, middle and high seniority) to constitute a comparative experiment. The efficacy of the model was objectively evaluated using the aforementioned evaluation indexes in conjunction with clinical analysis. Results:①Simulation experiment: The overall recognition accuracy of the model is 83.94%, with a precision of 89.52%, sensitivity of 83.94%, specificity of 95.99%, and the average interpretation time of each frame is 0.2 s. The AUC for sphenoid sinusitis was 0.865(95%CI 0.849-0.881), for frontal sinusitis was 0.924(0.991-0.936), for ethmoidoid sinusitis was 0.895(0.880-0.909), and for maxillary sinusitis was 0.974(0.967-0.982). ②Comparison experiment: In terms of recognition accuracy, the model was 84.52%, while the low-seniority physicians group was 78.50%, the middle-seniority physicians group was 80.50%, and the seniority physicians group was 83.50%; In terms of recognition accuracy, the model was 85.67%, the low seniority physicians group was 79.72%, the middle seniority physicians group was 82.67%, and the high seniority physicians group was 83.66%. In terms of recognition sensitivity, the model was 84.52%, the low seniority group was 78.50%, the middle seniority group was 80.50%, and the high seniority group was 83.50%. In terms of recognition specificity, the model was 96.58%, the low-seniority physicians group was 94.63%, the middle-seniority physicians group was 95.13%, and the seniority physicians group was 95.88%. In terms of time consumption, the average image per frame of the model is 0.20 s, the average image per frame of the low-seniority physicians group is 2.35 s, the average image per frame of the middle-seniority physicians group is 1.98 s, and the average image per frame of the senior physicians group is 2.19 s. Conclusion:This study demonstrates the potential of a deep learning-based artificial intelligence diagnostic model for chronic sinusitis to classify and diagnose chronic sinusitis; the deep learning-based artificial intelligence diagnosis model for chronic sinusitis has good classification performance and high diagnostic efficacy.
    目的:搭建基于VGG的慢性鼻窦炎计算机辅助诊断模型,并评价其效能。 方法:①收集5 000帧已确诊的鼻窦CT图像,将其分为正常组1 000帧图像(其中,正常的上颌窦、额窦、筛窦、蝶窦影像图像各250帧)及异常组4 000帧图像(其中,上颌窦炎、额窦炎、筛窦炎、蝶窦炎影像图像各1 000帧),对图像进行大小归一化及分割预处理;②训练模型并对其进行仿真实验,分别得到正常组,蝶窦炎组,额窦炎组,筛窦炎组以及上颌窦炎组5个分类模型,从准确度、精确度、灵敏度、特异度、判读时间及ROC曲线下面积(AUC)6个维度,客观评价模型的分类效能;③随机选取200帧图像,通过模型与低年资医师组、中年资医师组、高年资医师组分别阅片构成对比试验,结合临床通过以上评价指标客观评价模型的效能。 结果:①仿真实验:整个模型的识别准确度为83.94%,精确度为89.52%,灵敏度为83.94%,特异度为95.99%,平均每帧图像判读时间为0.20 s;蝶窦炎的AUC为0.865(95%CI 0.849~0.881),额窦炎的AUC为0.924(0.911~0.936),筛窦炎的AUC为0.895(0.880~0.909),上颌窦炎的AUC为0.974(0.967~0.982)。②对比实验:在识别准确度上,模型为84.52%,低年资医师组为78.5%、中年资医师组为80.5%,高年资医师组为83.5%;在识别精确度上,模型为85.67%,低年资医师组为79.72%,中年资医师组为82.67%,高年资医师组为83.66%;在识别灵敏度上,模型为84.52%,低年资医师组为78.50%,中年资医师组为80.50%,高年资医师组为83.50%;在识别特异度上,模型为96.58%,低年资医师组为94.63%,中年资医师组为95.13%,高年资医师组为95.88%;在耗时上,模型平均每帧图像为0.20 s,低年资医师组平均每帧图像为2.35 s,中年资医师组平均每帧图像为1.98 s,高年资医师组平均每帧图像为2.19 s。 结论:本研究强调了基于深度学习的慢性鼻窦炎人工智能诊断模型分类诊断慢性鼻窦炎的可能性;基于深度学习的慢性鼻窦炎人工智能诊断模型分类性能好,具有较高的诊断效能。.






  • 文章类型: Journal Article
    Currently, brain tumors are extremely harmful and prevalent. Deep learning technologies, including CNNs, UNet, and Transformer, have been applied in brain tumor segmentation for many years and have achieved some success. However, traditional CNNs and UNet capture insufficient global information, and Transformer cannot provide sufficient local information. Fusing the global information from Transformer with the local information of convolutions is an important step toward improving brain tumor segmentation. We propose the Group Normalization Shuffle and Enhanced Channel Self-Attention Network (GETNet), a network combining the pure Transformer structure with convolution operations based on VT-UNet, which considers both global and local information. The network includes the proposed group normalization shuffle block (GNS) and enhanced channel self-attention block (ECSA). The GNS is used after the VT Encoder Block and before the downsampling block to improve information extraction. An ECSA module is added to the bottleneck layer to utilize the characteristics of the detailed features in the bottom layer effectively. We also conducted experiments on the BraTS2021 dataset to demonstrate the performance of our network. The Dice coefficient (Dice) score results show that the values for the regions of the whole tumor (WT), tumor core (TC), and enhancing tumor (ET) were 91.77, 86.03, and 83.64, respectively. The results show that the proposed model achieves state-of-the-art performance compared with more than eleven benchmarks.






  • 文章类型: Journal Article
    The automatic segmentation of cardiac computed tomography (CT) and magnetic resonance imaging (MRI) plays a pivotal role in the prevention and treatment of cardiovascular diseases. In this study, we propose an efficient network based on the multi-scale, multi-head self-attention (MSMHSA) mechanism. The incorporation of this mechanism enables us to achieve larger receptive fields, facilitating the accurate segmentation of whole heart structures in both CT and MRI images. Within this network, features extracted from the shallow feature extraction network undergo a MHSA mechanism that closely aligns with human vision, resulting in the extraction of contextual semantic information more comprehensively and accurately. To improve the precision of cardiac substructure segmentation across varying sizes, our proposed method introduces three MHSA networks at distinct scales. This approach allows for fine-tuning the accuracy of micro-object segmentation by adapting the size of the segmented images. The efficacy of our method is rigorously validated on the Multi-Modality Whole Heart Segmentation (MM-WHS) Challenge 2017 dataset, demonstrating competitive results and the accurate segmentation of seven cardiac substructures in both cardiac CT and MRI images. Through comparative experiments with advanced transformer-based models, our study provides compelling evidence that despite the remarkable achievements of transformer-based models, the fusion of CNN models and self-attention remains a simple yet highly effective approach for dual-modality whole heart segmentation.






  • 文章类型: Journal Article
    Medical imaging is an important tool for clinical diagnosis. Nevertheless, it is very time-consuming and error-prone for physicians to prepare imaging diagnosis reports. Therefore, it is necessary to develop some methods to generate medical imaging reports automatically. Currently, the task of medical imaging report generation is challenging in at least two aspects: (1) medical images are very similar to each other. The differences between normal and abnormal images and between different abnormal images are usually trivial; (2) unrelated or incorrect keywords describing abnormal findings in the generated reports lead to mis-communications. In this paper, we propose a medical image report generation framework composed of four modules, including a Transformer encoder, a MIX-MLP multi-label classification network, a co-attention mechanism (CAM) based semantic and visual feature fusion, and a hierarchical LSTM decoder. The Transformer encoder can be used to learn long-range dependencies between images and labels, effectively extract visual and semantic features of images, and establish long-term dependent relationships between visual and semantic information to accurately extract abnormal features from images. The MIX-MLP multi-label classification network, the co-attention mechanism and the hierarchical LSTM network can better identify abnormalities, achieving visual and text alignment fusion and multi-label diagnostic classification to better facilitate report generation. The results of the experiments performed on two widely used radiology report datasets, IU X-RAY and MIMIC-CXR, show that our proposed framework outperforms current report generation models in terms of both natural linguistic generation metrics and clinical efficacy assessment metrics. The code of this work is available online at






  • 文章类型: Journal Article
    Medical imaging serves as a crucial tool in current cancer diagnosis. However, the quality of medical images is often compromised to minimize the potential risks associated with patient image acquisition. Computer-aided diagnosis systems have made significant advancements in recent years. These systems utilize computer algorithms to identify abnormal features in medical images, assisting radiologists in improving diagnostic accuracy and achieving consistency in image and disease interpretation. Importantly, the quality of medical images, as the target data, determines the achievable level of performance by artificial intelligence algorithms. However, the pixel value range of medical images differs from that of the digital images typically processed via artificial intelligence algorithms, and blindly incorporating such data for training can result in suboptimal algorithm performance. In this study, we propose a medical image-enhancement scheme that integrates generic digital image processing and medical image processing modules. This scheme aims to enhance medical image data by endowing them with high-contrast and smooth characteristics. We conducted experimental testing to demonstrate the effectiveness of this scheme in improving the performance of a medical image segmentation algorithm.





