medical image

  • 文章类型: Journal Article
    BACKGROUND: Chronic graft-versus-host disease (cGVHD) is a significant cause of long-term morbidity and mortality in patients after allogeneic hematopoietic cell transplantation. Skin is the most commonly affected organ, and visual assessment of cGVHD can have low reliability. Crowdsourcing data from nonexpert participants has been used for numerous medical applications, including image labeling and segmentation tasks.
    OBJECTIVE: This study aimed to assess the ability of crowds of nonexpert raters-individuals without any prior training for identifying or marking cGHVD-to demarcate photos of cGVHD-affected skin. We also studied the effect of training and feedback on crowd performance.
    METHODS: Using a Canfield Vectra H1 3D camera, 360 photographs of the skin of 36 patients with cGVHD were taken. Ground truth demarcations were provided in 3D by a trained expert and reviewed by a board-certified dermatologist. In total, 3000 2D images (projections from various angles) were created for crowd demarcation through the DiagnosUs mobile app. Raters were split into high and low feedback groups. The performances of 4 different crowds of nonexperts were analyzed, including 17 raters per image for the low and high feedback groups, 32-35 raters per image for the low feedback group, and the top 5 performers for each image from the low feedback group.
    RESULTS: Across 8 demarcation competitions, 130 raters were recruited to the high feedback group and 161 to the low feedback group. This resulted in a total of 54,887 individual demarcations from the high feedback group and 78,967 from the low feedback group. The nonexpert crowds achieved good overall performance for segmenting cGVHD-affected skin with minimal training, achieving a median surface area error of less than 12% of skin pixels for all crowds in both the high and low feedback groups. The low feedback crowds performed slightly poorer than the high feedback crowd, even when a larger crowd was used. Tracking the 5 most reliable raters from the low feedback group for each image recovered a performance similar to that of the high feedback crowd. Higher variability between raters for a given image was not found to correlate with lower performance of the crowd consensus demarcation and cannot therefore be used as a measure of reliability. No significant learning was observed during the task as more photos and feedback were seen.
    CONCLUSIONS: Crowds of nonexpert raters can demarcate cGVHD images with good overall performance. Tracking the top 5 most reliable raters provided optimal results, obtaining the best performance with the lowest number of expert demarcations required for adequate training. However, the agreement amongst individual nonexperts does not help predict whether the crowd has provided an accurate result. Future work should explore the performance of crowdsourcing in standard clinical photos and further methods to estimate the reliability of consensus demarcations.






  • 文章类型: Journal Article
    BACKGROUND: Total hip replacement (THR) is considered the gold standard of treatment for refractory degenerative hip disorders. Identifying patients who should receive THR in the short term is important. Some conservative treatments, such as intra-articular injection administered a few months before THR, may result in higher odds of arthroplasty infection. Delayed THR after functional deterioration may result in poorer outcomes and longer waiting times for those who have been flagged as needing THR. Deep learning (DL) in medical imaging applications has recently obtained significant breakthroughs. However, the use of DL in practical wayfinding, such as short-term THR prediction, is still lacking.
    OBJECTIVE: In this study, we will propose a DL-based assistant system for patients with pelvic radiographs to identify the need for THR within 3 months.
    METHODS: We developed a convolutional neural network-based DL algorithm to analyze pelvic radiographs, predict the hip region of interest (ROI), and determine whether or not THR is required. The data set was collected from August 2008 to December 2017. The images included 3013 surgical hip ROIs that had undergone THR and 1630 nonsurgical hip ROIs. The images were split, using split-sample validation, into training (n=3903, 80%), validation (n=476, 10%), and testing (n=475, 10%) sets to evaluate the algorithm performance.
    RESULTS: The algorithm, called SurgHipNet, yielded an area under the receiver operating characteristic curve of 0.994 (95% CI 0.990-0.998). The accuracy, sensitivity, specificity, and F1-score of the model were 0.977, 0.920, 0932, and 0.944, respectively.
    CONCLUSIONS: The proposed approach has demonstrated that SurgHipNet shows the ability and potential to provide efficient support in clinical decision-making; it can assist physicians in promptly determining the optimal timing for THR.






  • 文章类型: Journal Article
    BACKGROUND: Dermoscopy is commonly used for the evaluation of pigmented lesions, but agreement between experts for identification of dermoscopic structures is known to be relatively poor. Expert labeling of medical data is a bottleneck in the development of machine learning (ML) tools, and crowdsourcing has been demonstrated as a cost- and time-efficient method for the annotation of medical images.
    OBJECTIVE: The aim of this study is to demonstrate that crowdsourcing can be used to label basic dermoscopic structures from images of pigmented lesions with similar reliability to a group of experts.
    METHODS: First, we obtained labels of 248 images of melanocytic lesions with 31 dermoscopic \"subfeatures\" labeled by 20 dermoscopy experts. These were then collapsed into 6 dermoscopic \"superfeatures\" based on structural similarity, due to low interrater reliability (IRR): dots, globules, lines, network structures, regression structures, and vessels. These images were then used as the gold standard for the crowd study. The commercial platform DiagnosUs was used to obtain annotations from a nonexpert crowd for the presence or absence of the 6 superfeatures in each of the 248 images. We replicated this methodology with a group of 7 dermatologists to allow direct comparison with the nonexpert crowd. The Cohen κ value was used to measure agreement across raters.
    RESULTS: In total, we obtained 139,731 ratings of the 6 dermoscopic superfeatures from the crowd. There was relatively lower agreement for the identification of dots and globules (the median κ values were 0.526 and 0.395, respectively), whereas network structures and vessels showed the highest agreement (the median κ values were 0.581 and 0.798, respectively). This pattern was also seen among the expert raters, who had median κ values of 0.483 and 0.517 for dots and globules, respectively, and 0.758 and 0.790 for network structures and vessels. The median κ values between nonexperts and thresholded average-expert readers were 0.709 for dots, 0.719 for globules, 0.714 for lines, 0.838 for network structures, 0.818 for regression structures, and 0.728 for vessels.
    CONCLUSIONS: This study confirmed that IRR for different dermoscopic features varied among a group of experts; a similar pattern was observed in a nonexpert crowd. There was good or excellent agreement for each of the 6 superfeatures between the crowd and the experts, highlighting the similar reliability of the crowd for labeling dermoscopic images. This confirms the feasibility and dependability of using crowdsourcing as a scalable solution to annotate large sets of dermoscopic images, with several potential clinical and educational applications, including the development of novel, explainable ML tools.






  • 文章类型: Journal Article
    Interpretation of medical images with a computer-aided diagnosis (CAD) system is arduous because of the complex structure of cancerous lesions in different imaging modalities, high degree of resemblance between inter-classes, presence of dissimilar characteristics in intra-classes, scarcity of medical data, and presence of artifacts and noises. In this study, these challenges are addressed by developing a shallow convolutional neural network (CNN) model with optimal configuration performing ablation study by altering layer structure and hyper-parameters and utilizing a suitable augmentation technique. Eight medical datasets with different modalities are investigated where the proposed model, named MNet-10, with low computational complexity is able to yield optimal performance across all datasets. The impact of photometric and geometric augmentation techniques on different datasets is also evaluated. We selected the mammogram dataset to proceed with the ablation study for being one of the most challenging imaging modalities. Before generating the model, the dataset is augmented using the two approaches. A base CNN model is constructed first and applied to both the augmented and non-augmented mammogram datasets where the highest accuracy is obtained with the photometric dataset. Therefore, the architecture and hyper-parameters of the model are determined by performing an ablation study on the base model using the mammogram photometric dataset. Afterward, the robustness of the network and the impact of different augmentation techniques are assessed by training the model with the rest of the seven datasets. We obtain a test accuracy of 97.34% on the mammogram, 98.43% on the skin cancer, 99.54% on the brain tumor magnetic resonance imaging (MRI), 97.29% on the COVID chest X-ray, 96.31% on the tympanic membrane, 99.82% on the chest computed tomography (CT) scan, and 98.75% on the breast cancer ultrasound datasets by photometric augmentation and 96.76% on the breast cancer microscopic biopsy dataset by geometric augmentation. Moreover, some elastic deformation augmentation methods are explored with the proposed model using all the datasets to evaluate their effectiveness. Finally, VGG16, InceptionV3, and ResNet50 were trained on the best-performing augmented datasets, and their performance consistency was compared with that of the MNet-10 model. The findings may aid future researchers in medical data analysis involving ablation studies and augmentation techniques.






  • 文章类型: Journal Article
    BACKGROUND: Safe and accurate execution of surgeries to date mainly rely on preoperative plans generated based on preoperative imaging. Frequent intraoperative interaction with such patient images during the intervention is needed, which is currently a cumbersome process given that such images are generally displayed on peripheral two-dimensional (2D) monitors and controlled through interface devices that are outside the sterile filed. This study proposes a new medical image control concept based on a Brain Computer Interface (BCI) that allows for hands-free and direct image manipulation without relying on gesture recognition methods or voice commands.
    METHODS: A software environment was designed for displaying three-dimensional (3D) patient images onto external monitors, with the functionality of hands-free image manipulation based on the user\'s brain signals detected by the BCI device (i.e., visually evoked signals). In a user study, ten orthopedic surgeons completed a series of standardized image manipulation tasks to navigate and locate predefined 3D points in a Computer Tomography (CT) image using the developed interface. Accuracy was assessed as the mean error between the predefined locations (ground truth) and the navigated locations by the surgeons. All surgeons rated the performance and potential intraoperative usability in a standardized survey using a five-point Likert scale (1 = strongly disagree to 5 = strongly agree).
    RESULTS: When using the developed interface, the mean image control error was 15.51 mm (SD: 9.57). The user\'s acceptance was rated with a Likert score of 4.07 (SD: 0.96) while the overall impressions of the interface was rated as 3.77 (SD: 1.02) by the users. We observed a significant correlation between the users\' overall impression and the calibration score they achieved.
    CONCLUSIONS: The use of the developed BCI, that allowed for a purely brain-guided medical image control, yielded promising results, and showed its potential for future intraoperative applications. The major limitation to overcome was noted as the interaction delay.






  • 文章类型: Journal Article
    Idiopathic pulmonary fibrosis (IPF) is a fatal interstitial lung disease characterized by an unpredictable decline in lung function. Predicting IPF progression from the early changes in lung function tests have known to be a challenge due to acute exacerbation. Although it is unpredictable, the neighboring regions of fibrotic reticulation increase during IPF\'s progression. With this clinical information, quantitative characteristics of high-resolution computed tomography (HRCT) and a statistical learning paradigm, the aim is to build a model to predict IPF progression.
    A paired set of anonymized 193 HRCT images from IPF subjects with 6-12 month intervals were collected retrospectively. The study was conducted in two parts: (1) Part A collects the ground truth in small regions of interest (ROIs) with labels of \"expected to progress\" or \"expected to be stable\" at baseline HRCT and develop a statistical learning model to classify voxels in the ROIs. (2) Part B uses the voxel-level classifier from Part A to produce whole-lung level scores of a single-scan total probability\'s (STP) baseline.
    Using annotated ROIs from 71 subjects\' HRCT scans in Part A, we applied Quantum Particle Swarm Optimization-Random Forest (QPSO-RF) to build the classifier. Then, 122 subjects\' HRCT scans were used to test the prediction. Using Spearman rank correlations and survival analyses, we ascertained STP associations with 6-12 month changes in quantitative lung fibrosis and forced vital capacity.
    This study can serve as a reference for collecting ground truth, and developing statistical learning techniques to predict progression in medical imaging.







  • 文章类型: Journal Article
    Bone metastasis is among the most frequent in diseases to patients suffering from metastatic cancer, such as breast or prostate cancer. A popular diagnostic method is bone scintigraphy where the whole body of the patient is scanned. However, hot spots that are presented in the scanned image can be misleading, making the accurate and reliable diagnosis of bone metastasis a challenge. Artificial intelligence can play a crucial role as a decision support tool to alleviate the burden of generating manual annotations on images and therefore prevent oversights by medical experts. So far, several state-of-the-art convolutional neural networks (CNN) have been employed to address bone metastasis diagnosis as a binary or multiclass classification problem achieving adequate accuracy (higher than 90%). However, due to their increased complexity (number of layers and free parameters), these networks are severely dependent on the number of available training images that are typically limited within the medical domain. Our study was dedicated to the use of a new deep learning architecture that overcomes the computational burden by using a convolutional neural network with a significantly lower number of floating-point operations (FLOPs) and free parameters. The proposed lightweight look-behind fully convolutional neural network was implemented and compared with several well-known powerful CNNs, such as ResNet50, VGG16, Inception V3, Xception, and MobileNet on an imaging dataset of moderate size (778 images from male subjects with prostate cancer). The results prove the superiority of the proposed methodology over the current state-of-the-art on identifying bone metastasis. The proposed methodology demonstrates a unique potential to revolutionize image-based diagnostics enabling new possibilities for enhanced cancer metastasis monitoring and treatment.






