inter-rater variability

评分者之间的差异
  • 文章类型: Journal Article
    背景:肺部超声检查(LUS)是一种非侵入性成像方法,用于诊断和监测肺水肿等疾病,肺炎,和气胸.在其他成像技术如CT扫描或胸部X光检查方法有限的情况下,这是非常宝贵的,特别是在资源减少的低收入和中等收入国家。此外,LUS减少辐射暴露及其相关的血癌不良事件,这在儿童和青少年中尤其重要。用LUS获得的分数允许对曝气的区域损失进行半量化,它可以为大多数呼吸系统疾病的严重程度提供有价值和可靠的评估。然而,观察者间评分的可靠性从未得到系统评估.本研究旨在评估有经验的LUS操作员对显示预定义发现的视频剪辑样本的协议。
    方法:25个匿名视频片段,全面描述了LUS评分的不同值,并使用在线表格向不了解患者临床数据和研究目标的著名LUS专家展示。从五种不同的超声机器获得夹子。弗莱斯-科恩加权卡帕被用来评估专家的协议。
    结果:在3个月的时间内,20名经验丰富的操作员完成了评估。大多数在ICU工作(10),ED(6),HDU(2),心脏病区(1),或产科/妇科(1)。比例LUS评分平均值为15.3(SD1.6)。评分者之间的协议各不相同:6个剪辑完全同意,3有20个评分者中的19个同意,3有18个同意,而其余的13个人有17个或更少的人同意分配的分数。得分0和得分3比得分1和2更具可重复性。总体答案的Fleiss\'Kappa为0.87(95%CI0.815-0.931,p<0.001)。
    结论:有经验的LUS运营商之间的评估者之间的协议非常高,虽然不完美。强一致性和小方差使我们能够说,对LUS评分的测量值的20%容差是患者真实LUS评分的可靠估计,导致评分解释的变异性降低,并提高其临床使用的信心。
    BACKGROUND: Lung ultrasonography (LUS) is a non-invasive imaging method used to diagnose and monitor conditions such as pulmonary edema, pneumonia, and pneumothorax. It is precious where other imaging techniques like CT scan or chest X-rays are of limited access, especially in low- and middle-income countries with reduced resources. Furthermore, LUS reduces radiation exposure and its related blood cancer adverse events, which is particularly relevant in children and young subjects. The score obtained with LUS allows semi-quantification of regional loss of aeration, and it can provide a valuable and reliable assessment of the severity of most respiratory diseases. However, inter-observer reliability of the score has never been systematically assessed. This study aims to assess experienced LUS operators\' agreement on a sample of video clips showing predefined findings.
    METHODS: Twenty-five anonymized video clips comprehensively depicting the different values of LUS score were shown to renowned LUS experts blinded to patients\' clinical data and the study\'s aims using an online form. Clips were acquired from five different ultrasound machines. Fleiss-Cohen weighted kappa was used to evaluate experts\' agreement.
    RESULTS: Over a period of 3 months, 20 experienced operators completed the assessment. Most worked in the ICU (10), ED (6), HDU (2), cardiology ward (1), or obstetric/gynecology department (1). The proportional LUS score mean was 15.3 (SD 1.6). Inter-rater agreement varied: 6 clips had full agreement, 3 had 19 out of 20 raters agreeing, and 3 had 18 agreeing, while the remaining 13 had 17 or fewer people agreeing on the assigned score. Scores 0 and score 3 were more reproducible than scores 1 and 2. Fleiss\' Kappa for overall answers was 0.87 (95% CI 0.815-0.931, p < 0.001).
    CONCLUSIONS: The inter-rater agreement between experienced LUS operators is very high, although not perfect. The strong agreement and the small variance enable us to say that a 20% tolerance around a measured value of a LUS score is a reliable estimate of the patient\'s true LUS score, resulting in reduced variability in score interpretation and greater confidence in its clinical use.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:客观结构化临床检查(OSCEs)是医学生越来越流行的评估方式。虽然面对面的互动可以进行更深入的评估,这可能会导致标准化问题。量化的方法,需要限制或调整考官的效果。
    方法:数据来自2022-2023学年巴黎城市大学900名学生5年级和6年级医学生的3个OSCE。会议各有五个站,三个会议中的一个是由两个评估者(而不是一个)以协商一致方式打分的。我们报告了其中一个班级的OSCEs纵向一致性,以及与员工相关的课程和学生差异。我们还提出了一种统计方法,通过得出统计随机学生效应来调整评分者之间的差异,该效应考虑了与员工相关的和站点随机效应。
    结果:从四个会话中,从2615个学生课程中收集了16910个站的分数,同一个学生参加了两次会议,和36、36、35和20个不同的工作人员小组在每个站为每个会议。分数具有与员工相关的异质性(p<10-15),与员工级别的标准误差相比,机会大约增加了一倍。混合模型,与员工相关的异质性分别解释了11.4%,11.6%,和4.7%的车站得分方差(95%置信区间,分别为9.5-13.8、9.7-14.1和3.9-5.8),具有1、1和2个评分者,提示共识评分的调节作用。学生随机效应解释了一小部分方差,分别为8.8%,11.3%,9.6%(8.0-9.7、10.3-12.4和8.7-10.5),这种低数量的信号导致学生排名随着时间的推移与这个指标不再一致,而不是平均分数(p=0.45)。
    结论:员工差异对OSCE成绩的影响与学生差异一样多,前者可以通过双重评估来减少,也可以通过混合模型来调整。与未测量的变异性来源相比,两者都很小,使它们难以持续捕获。
    BACKGROUND: Objective Structured Clinical Examinations (OSCEs) are an increasingly popular evaluation modality for medical students. While the face-to-face interaction allows for more in-depth assessment, it may cause standardization problems. Methods to quantify, limit or adjust for examiner effects are needed.
    METHODS: Data originated from 3 OSCEs undergone by 900-student classes of 5th- and 6th-year medical students at Université Paris Cité in the 2022-2023 academic year. Sessions had five stations each, and one of the three sessions was scored by consensus by two raters (rather than one). We report OSCEs\' longitudinal consistency for one of the classes and staff-related and student variability by session. We also propose a statistical method to adjust for inter-rater variability by deriving a statistical random student effect that accounts for staff-related and station random effects.
    RESULTS: From the four sessions, a total of 16,910 station scores were collected from 2615 student sessions, with two of the sessions undergone by the same students, and 36, 36, 35 and 20 distinct staff teams in each station for each session. Scores had staff-related heterogeneity (p<10-15), with staff-level standard errors approximately doubled compared to chance. With mixed models, staff-related heterogeneity explained respectively 11.4%, 11.6%, and 4.7% of station score variance (95% confidence intervals, 9.5-13.8, 9.7-14.1, and 3.9-5.8, respectively) with 1, 1 and 2 raters, suggesting a moderating effect of consensus grading. Student random effects explained a small proportion of variance, respectively 8.8%, 11.3%, and 9.6% (8.0-9.7, 10.3-12.4, and 8.7-10.5), and this low amount of signal resulted in student rankings being no more consistent over time with this metric, rather than with average scores (p=0.45).
    CONCLUSIONS: Staff variability impacts OSCE scores as much as student variability, and the former can be reduced with dual assessment or adjusted for with mixed models. Both are small compared to unmeasured sources of variability, making them difficult to capture consistently.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:对于特征不明确的大(>2cm)直肠息肉,围手术期的决策是复杂的。最常见的术中评估是仅临床医生的判断,而放射学和内窥镜活检可以提供围手术期细节。荧光增强机器学习(FA-ML)方法可以优化局部治疗策略。
    方法:不同等级的外科医生,都独立进行结肠镜检查,被要求在交互式视频平台(Mindstamp)上对大型良性和早期恶性(可能适合局部切除)直肠病变的内窥镜视频进行视觉判断,并将结果与最终病理进行比较,放射学和一种新的FA-ML分类器。使用FleissMulti-raterKappa评分对数据进行统计分析,Spearman系数和频率表。
    结果:32名外科医生判断了14个模糊的息肉视频(7个良性,7恶性)。在所有癌症中,最初的内镜活检产生了假阴性结果.每种病变类型中有5种具有切除前MRI,在良性病变中具有60%的假阳性恶性预测,在癌症中具有60%的过度分期和40%的模棱两可率。平均临床视觉癌症判断准确率为49%(只有“公平”评估者之间的共识),许多报告不确定性和较高的报告决策置信度并不意味着较高的准确度.这与86%的ML精度相比。大小在视觉上被误判为20%,息肉大小在4/6中被低估,在2/6中被高估。关于7/14病变所需的决策的主观叙述表明,参与者之间存在广泛的理论差异。
    结论:目前可用的直肠病变评估的临床方法在观察者间差异很大的情况下是次优的。基于荧光的人工智能增强可以通过客观推进这个领域,可解释的ML方法。
    OBJECTIVE: Perioperative decision making for large (> 2 cm) rectal polyps with ambiguous features is complex. The most common intraprocedural assessment is clinician judgement alone while radiological and endoscopic biopsy can provide periprocedural detail. Fluorescence-augmented machine learning (FA-ML) methods may optimise local treatment strategy.
    METHODS: Surgeons of varying grades, all performing colonoscopies independently, were asked to visually judge endoscopic videos of large benign and early-stage malignant (potentially suitable for local excision) rectal lesions on an interactive video platform (Mindstamp) with results compared with and between final pathology, radiology and a novel FA-ML classifier. Statistical analyses of data used Fleiss Multi-rater Kappa scoring, Spearman Coefficient and Frequency tables.
    RESULTS: Thirty-two surgeons judged 14 ambiguous polyp videos (7 benign, 7 malignant). In all cancers, initial endoscopic biopsy had yielded false-negative results. Five of each lesion type had had a pre-excision MRI with a 60% false-positive malignancy prediction in benign lesions and a 60% over-staging and 40% equivocal rate in cancers. Average clinical visual cancer judgement accuracy was 49% (with only \'fair\' inter-rater agreement), many reporting uncertainty and higher reported decision confidence did not correspond to higher accuracy. This compared to 86% ML accuracy. Size was misjudged visually by a mean of 20% with polyp size underestimated in 4/6 and overestimated in 2/6. Subjective narratives regarding decision-making requested for 7/14 lesions revealed wide rationale variation between participants.
    CONCLUSIONS: Current available clinical means of ambiguous rectal lesion assessment is suboptimal with wide inter-observer variation. Fluorescence based AI augmentation may advance this field via objective, explainable ML methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:我们根据专家意见为受训者发布了一份“必须知道”的常规脑电图(rEEG)发现清单。这里,我们研究了这些“必须知道”的rEEG发现的准确性和评估者之间的共识(IRA)。
    方法:将先前验证的在线rEEG检查分发给EEG专家。它包括一项调查和30个多项选择题,这些问题基于先前发表的“必须知道”的rEEG发现,分为四个领域:正常,异常,正常变异,和文物。问题包含10到20秒的EEG时期,被五位EEG专家视为明确的例子。
    结果:检查由258名国际脑电图专家完成。总体平均准确度和IRA(AC1)为81%,相当高(0.632),分别。特定领域的平均准确度和IRA为:76%,中度(0.558)(正常);78%,中度(0.575)(异常);85%,实质性(0.678)(正常变异);85%,大量(0.740)(工件)。学术专家的准确率高于私人执业专家(82%vs.77%;p=0.035)。特定国家的总体平均准确度和IRA为:92%,几乎完美(0.836)(美国);86%,实质性(0.762)(巴西);79%,实质性(0.646)(意大利);72%,中等(0.496)(印度)。总之,“必须知道”的rEEG发现的集体专家准确性和IRA是次优且异质的。
    结论:我们建议开发和实施务实的,可访问,衡量和提高专家准确性和IRA的特定国家方法。
    OBJECTIVE: We published a list of \"must-know\" routine EEG (rEEG) findings for trainees based on expert opinion. Here, we studied the accuracy and inter-rater agreement (IRA) of these \"must-know\" rEEG findings among international experts.
    METHODS: A previously validated online rEEG examination was disseminated to EEG experts. It consisted of a survey and 30 multiple-choice questions predicated on the previously published \"must-know\" rEEG findings divided into four domains: normal, abnormal, normal variants, and artifacts. Questions contained de-identified 10-20-s epochs of EEG that were considered unequivocal examples by five EEG experts.
    RESULTS: The examination was completed by 258 international EEG experts. Overall mean accuracy and IRA (AC1) were 81% and substantial (0.632), respectively. The domain-specific mean accuracies and IRA were: 76%, moderate (0.558) (normal); 78%, moderate (0.575) (abnormal); 85%, substantial (0.678) (normal variants); 85%, substantial (0.740) (artifacts). Academic experts had a higher accuracy than private practice experts (82% vs. 77%; p = .035). Country-specific overall mean accuracies and IRA were: 92%, almost perfect (0.836) (U.S.); 86%, substantial (0.762) (Brazil); 79%, substantial (0.646) (Italy); and 72%, moderate (0.496) (India). In conclusion, collective expert accuracy and IRA of \"must-know\" rEEG findings are suboptimal and heterogeneous.
    CONCLUSIONS: We recommend the development and implementation of pragmatic, accessible, country-specific ways to measure and improve the expert accuracy and IRA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    医学成像中肝脏和肿瘤区域的准确分割对诊断至关重要,治疗,和监测肝细胞癌(HCC)患者。然而,手动分割是耗时的,并且受到评估者之间和内部差异的影响。因此,自动化方法是必要的,但需要基于评估者的共识对高质量的分割进行严格的验证。为了满足该领域对可靠和全面数据的需求,我们介绍LiverHccSeg,一个数据集,提供肝脏和肿瘤分割的多相对比增强磁共振成像从两个委员会批准的腹部放射科医生,以及对评估者之间协议的分析。LiverHccSeg为肝脏和HCC肿瘤分割任务提供了精选资源。该数据集包括科学阅读和共同注册的对比增强多相磁共振成像(MRI)扫描,并由两名董事会批准的腹部放射科医生进行相应的手动分割和相关元数据,并为研究人员提供了外部验证的全面基础。以及肝脏和肿瘤分割算法的基准测试。数据集还提供了对两组肝脏和肿瘤分割之间的一致性的分析。通过计算适当的细分指标,我们为放射科医师提供了肝脏和肿瘤分割的一致性和变异性的见解。共包括17例用于肝脏分割和14例用于HCC肿瘤分割。肝脏分割显示出高分割一致性(平均骰子,0.95±0.01[标准偏差])和HCC肿瘤分割显示出更高的变异(平均骰子,0.85±0.16[标准偏差])。LiverHccSeg的应用可以是多方面的,从在公共外部数据上测试机器学习算法到放射学特征分析。利用数据集中的评分者间协议分析,研究人员可以研究变异性对分割性能的影响,并探索提高肝癌患者肝脏和肿瘤分割算法准确性和鲁棒性的方法。通过公开这个数据集,LiverHccSeg旨在促进合作,促进创新的解决方案,并最终改善HCC诊断和治疗的患者预后。
    Accurate segmentation of liver and tumor regions in medical imaging is crucial for the diagnosis, treatment, and monitoring of hepatocellular carcinoma (HCC) patients. However, manual segmentation is time-consuming and subject to inter- and intra-rater variability. Therefore, automated methods are necessary but require rigorous validation of high-quality segmentations based on a consensus of raters. To address the need for reliable and comprehensive data in this domain, we present LiverHccSeg, a dataset that provides liver and tumor segmentations on multiphasic contrast-enhanced magnetic resonance imaging from two board-approved abdominal radiologists, along with an analysis of inter-rater agreement. LiverHccSeg provides a curated resource for liver and HCC tumor segmentation tasks. The dataset includes a scientific reading and co-registered contrast-enhanced multiphasic magnetic resonance imaging (MRI) scans with corresponding manual segmentations by two board-approved abdominal radiologists and relevant metadata and offers researchers a comprehensive foundation for external validation, and benchmarking of liver and tumor segmentation algorithms. The dataset also provides an analysis of the agreement between the two sets of liver and tumor segmentations. Through the calculation of appropriate segmentation metrics, we provide insights into the consistency and variability in liver and tumor segmentations among the radiologists. A total of 17 cases were included for liver segmentation and 14 cases for HCC tumor segmentation. Liver segmentations demonstrates high segmentation agreement (mean Dice, 0.95 ± 0.01 [standard deviation]) and HCC tumor segmentations showed higher variation (mean Dice, 0.85 ± 0.16 [standard deviation]). The applications of LiverHccSeg can be manifold, ranging from testing machine learning algorithms on public external data to radiomic feature analyses. Leveraging the inter-rater agreement analysis within the dataset, researchers can investigate the impact of variability on segmentation performance and explore methods to enhance the accuracy and robustness of liver and tumor segmentation algorithms in HCC patients. By making this dataset publicly available, LiverHccSeg aims to foster collaborations, facilitate innovative solutions, and ultimately improve patient outcomes in the diagnosis and treatment of HCC.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:新生儿疼痛评估(NPA)代表了一个至关重要的巨大全球问题,及时准确的评估新生儿疼痛是实施疼痛管理不可缺少的。
    目的:研究通过基于视频的NPA(VB-NPA)和现场NPA(OS-NPA)得出的疼痛评分的一致性,提供在现实世界中采用VB-NPA结果的科学依据和可行性,作为临床研究中新生儿疼痛的黄金标准和基于人工智能(AI)的NPA(AI-NPA)应用的标签。
    方法:从中国某儿科医院招募598例新生儿。
    方法:这项观察性研究记录了598例新生儿,这些新生儿经历了10例疼痛手术之一,包括动脉采血,脚跟采血,指尖采血,静脉注射,皮下注射,外周静脉插管,鼻咽吸引,保留灌肠,粘合剂去除,和伤口敷料。两名经验丰富的护士使用新生儿疼痛量表通过双盲评分以10天的间隔进行OS-NPA和VB-NPA,以评估新生儿的疼痛程度。计算和分析了评分者内部和评分者之间的可靠性,使用配对样本t检验来探索通过OS-NPA和VB-NPA得出的评估者疼痛评分的偏倚和一致性。使用三种最先进的AI方法评估了不同标签来源的影响,这些方法使用OS-NPA和VB-NPA给出的标签进行了训练,分别。
    结果:同一评估者的内部可靠性在不同时间为0.976-0.983,由类内相关系数测量。单一措施的评估者间可靠性为0.983,平均措施为0.992。在OS-NPA评分和独立VB-NPA评估者的评估之间没有观察到显著差异。对于三种AI方法,不同的标签来源仅导致0.022-0.044的有限精度损失。
    结论:与OS-NPA相比,VB-NPA在现实世界中是评估新生儿疼痛的有效方法,因为它具有较高的评分者内部和评分者间可靠性,可用于标记大规模NPA视频数据库以进行临床研究和AI培训。
    BACKGROUND: Neonatal pain assessment (NPA) represents a huge global problem of essential importance, as a timely and accurate assessment of neonatal pain is indispensable for implementing pain management.
    OBJECTIVE: To investigate the consistency of pain scores derived through video-based NPA (VB-NPA) and on-site NPA (OS-NPA), providing the scientific foundation and feasibility of adopting VB-NPA results in a real-world scenario as the gold standard for neonatal pain in clinical studies and labels for artificial intelligence (AI)-based NPA (AI-NPA) applications.
    METHODS: A total of 598 neonates were recruited from a pediatric hospital in China.
    METHODS: This observational study recorded 598 neonates who underwent one of 10 painful procedures, including arterial blood sampling, heel blood sampling, fingertip blood sampling, intravenous injection, subcutaneous injection, peripheral intravenous cannulation, nasopharyngeal suctioning, retention enema, adhesive removal, and wound dressing. Two experienced nurses performed OS-NPA and VB-NPA at a 10-day interval through double-blind scoring using the Neonatal Infant Pain Scale to evaluate the pain level of the neonates. Intra-rater and inter-rater reliability were calculated and analyzed, and a paired samples t-test was used to explore the bias and consistency of the assessors\' pain scores derived through OS-NPA and VB-NPA. The impact of different label sources was evaluated using three state-of-the-art AI methods trained with labels given by OS-NPA and VB-NPA, respectively.
    RESULTS: The intra-rater reliability of the same assessor was 0.976-0.983 across different times, as measured by the intraclass correlation coefficient. The inter-rater reliability was 0.983 for single measures and 0.992 for average measures. No significant differences were observed between the OS-NPA scores and the assessment of an independent VB-NPA assessor. The different label sources only caused a limited accuracy loss of 0.022-0.044 for the three AI methods.
    CONCLUSIONS: VB-NPA in a real-world scenario is an effective way to assess neonatal pain due to its high intra-rater and inter-rater reliability compared to OS-NPA and could be used for the labeling of large-scale NPA video databases for clinical studies and AI training.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    尽管它广泛用于基于临界点的痴呆诊断,Addenbrooke的认知检查第三版(ACE-III)的评分者间变异性研究甚少。
    来自老年人的31名医疗保健专业人员\'心理健康团队根据模拟患者以计算机形式对两种ACE-III方案进行了评分。评分准确性,以及总体和特定领域的评分变异性,计算;获得与参与者相关的因素,包括他们管理ACE-III的经验水平和自我评估的信心。
    评分者之间存在相当大的差异(其中一种情况高达18分),一个病例的平均得分明显高于真实得分(近4分)。流利,视觉空间和注意力域比语言和记忆具有更大的变异性。较高的评分准确性与较高的经验水平或较高的ACE-III管理的自信心无关。
    结果表明,ACE-III容易受到评分误差和相当大的评分者间变异性的影响,这凸显了初始阶段的至关重要性,并继续,管理和评分培训。
    UNASSIGNED: Despite its wide use in dementia diagnosis on the basis of cut-off points, the inter-rater variability of the Addenbrooke\'s Cognitive Examination-Third Edition (ACE-III) has been poorly studied.
    UNASSIGNED: Thirty-one healthcare professionals from an older adults\' mental health team scored two ACE-III protocols based on mock patients in a computerised form. Scoring accuracy, as well as total and domain-specific scoring variability, were calculated; factors relevant to participants were obtained, including their level of experience and self-rated confidence administering the ACE-III.
    UNASSIGNED: There was considerable inter-rater variability (up to 18 points for one of the cases), and one case\'s mean score was significantly higher (by nearly four points) than the true score. The Fluency, Visuospatial and Attention domains had greater levels of variability than Language and Memory. Higher scoring accuracy was not associated with either greater levels of experience or higher self-confidence in administering the ACE-III.
    UNASSIGNED: The results suggest that the ACE-III is susceptible to scoring error and considerable inter-rater variability, which highlights the critical importance of initial, and continued, administration and scoring training.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:使用MRI对前列腺癌(PCa)进行治疗需要对前列腺进行准确的区域分割。方法:这项工作的目的是介绍UFNet,基于深度学习的T2加权(T2w)MRI前列腺区域自动分割方法。它考虑了图像的各向异性,包括空间和通道注意机制,并使用损失函数来实施前列腺分区。该方法应用于私有多中心三维T2wMRI数据集和公共二维T2wMRI数据集ProstateX。为了评估模型性能,将该算法在私有数据集上分割的结构与7名不同经验水平的放射科医师获得的结构进行了比较。结果:在私有数据集上,我们获得了整个腺体(WG)的骰子得分(DSC)为93.90±2.85,过渡区(TZ)为91.00±4.34,外围区(PZ)为79.08±7.08。结果明显优于其他比较网络(p值<0.05)。在ProstateX上,我们得到WG的DSC为90.90±2.94,TZ为86.84±4.33,PZ为78.40±7.31。这些结果与最新的结果相似,在私人数据集上,与放射科医生获得的一致。还保留了放射科医生注释的病变的区域位置和扇形位置。结论:基于深度学习的方法可以提供准确的前列腺区域分割,导致一致的区域位置和病变的扇形位置,因此可以作为PCa诊断的辅助工具。
    Purpose: An accurate zonal segmentation of the prostate is required for prostate cancer (PCa) management with MRI. Approach: The aim of this work is to present UFNet, a deep learning-based method for automatic zonal segmentation of the prostate from T2-weighted (T2w) MRI. It takes into account the image anisotropy, includes both spatial and channelwise attention mechanisms and uses loss functions to enforce prostate partition. The method was applied on a private multicentric three-dimensional T2w MRI dataset and on the public two-dimensional T2w MRI dataset ProstateX. To assess the model performance, the structures segmented by the algorithm on the private dataset were compared with those obtained by seven radiologists of various experience levels. Results: On the private dataset, we obtained a Dice score (DSC) of 93.90 ± 2.85 for the whole gland (WG), 91.00 ± 4.34 for the transition zone (TZ), and 79.08 ± 7.08 for the peripheral zone (PZ). Results were significantly better than other compared networks\' ( p - value < 0.05 ). On ProstateX, we obtained a DSC of 90.90 ± 2.94 for WG, 86.84 ± 4.33 for TZ, and 78.40 ± 7.31 for PZ. These results are similar to state-of-the art results and, on the private dataset, are coherent with those obtained by radiologists. Zonal locations and sectorial positions of lesions annotated by radiologists were also preserved. Conclusions: Deep learning-based methods can provide an accurate zonal segmentation of the prostate leading to a consistent zonal location and sectorial position of lesions, and therefore can be used as a helping tool for PCa diagnosis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:使用“RoundRobin”研究设计评估MScanFitMUNE的评估者间可靠性。
    方法:来自不同中心的12名评估者在两天内检查了6名健康研究参与者。median,尺神经和腓总神经被刺激,和复合肌肉动作电位(CMAP)-扫描记录从短肌外展肌(APB),小指外展肌(ADM)和胫骨前肌(TA)。由此,我们计算了电机单元数量估计(MUNE)和“A50”,电机单元尺寸参数。作为统计分析,我们使用了协议限额(LOA)和变异系数(COV)。研究参与者在从0(无疼痛)到10(难以忍受的疼痛)的评分表上对他们对疼痛的感知进行评分。
    结果:在本研究之前,41.6%的评估者进行了不到五次的MScanFit。平均MUNE值为:99.6(APB),131.4(ADM)和126.2(TA),LOA:19.5(APB),29.8(ADM)和20.7(TA),和COV:13.4(APB),6.3(ADM)和5.6(TA)。与CMAP最大振幅相关的MUNE值(R2值:0.463(APB)(p<0.001),0.421(ADM)(p<0.001)和0.645(TA)(p<0.001))。疼痛的平均感知为4。
    结论:MScanFit表明评估者之间的可靠性很高,即使只有有限的评估者经验,并且患者总体上可以很好地耐受。这些结果可能表明MScanFit是一种可靠的MUNE方法,具有在药物试验中作为生物标志物的潜力。
    OBJECTIVE: To assess the inter-rater reliability of MScanFit MUNE using a \"Round Robin\" research design.
    METHODS: Twelve raters from different centres examined six healthy study participants over two days. Median, ulnar and common peroneal nerves were stimulated, and compound muscle action potential (CMAP)-scans were recorded from abductor pollicis brevis (APB), abductor digiti minimi (ADM) and anterior tibial (TA) muscles respectively. From this we calculated the Motor Unit Number Estimation (MUNE) and \"A50\", a motor unit size parameter. As statistical analysis we used the measures Limits of Agreement (LOA) and Coefficient of Variation (COV). Study participants scored their perception of pain from the examinations on a rating scale from 0 (no pain) to 10 (unbearable pain).
    RESULTS: Before this study, 41.6% of the raters had performed MScanFit less than five times. The mean MUNE-values were: 99.6 (APB), 131.4 (ADM) and 126.2 (TA), with LOA: 19.5 (APB), 29.8 (ADM) and 20.7 (TA), and COV: 13.4 (APB), 6.3 (ADM) and 5.6 (TA). MUNE-values correlated to CMAP max amplitudes (R2-values were: 0.463 (APB) (p<0.001), 0.421 (ADM) (p<0.001) and 0.645 (TA) (p<0.001)). The average perception of pain was 4.
    CONCLUSIONS: MScanFit indicates a high level of inter-rater reliability, even with only limited rater experience and is overall reasonably well tolerated by patients. These results may indicate MScanFit as a reliable MUNE method with potential as a biomarker in drug trials.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    The clinical relevance of rapid eye movement sleep-related obstructive sleep apnea (REM OSA) is supported by its associated adverse health outcomes and impact on optimal treatment strategies. To date, no assessment of REM OSA phenotyping performance has been conducted for any type of sleep testing technology. The objective of this study was to assess this for polysomnography and peripheral arterial tone-based home sleep apnea testing (PAT HSAT). In a dataset comprising 261 participants, the sensitivity and specificity of the agreement on REM OSA phenotyping was assessed for two independent scorings of polysomnography and a synchronously administered PAT HSAT. The sensitivity and specificity of REM OSA phenotyping were 0.87 and 0.89, respectively, for the polysomnography inter-scorer comparison, and 0.68 and 0.97 for the PAT HSAT on a single-night basis, using the conventional minimum required rapid eye movement sleep time of 30 min. Polysomnography-based REM OSA phenotyping was found to be sensitive and specific even for a single-night testing protocol. Peripheral arterial tone-based REM OSA phenotyping showed a lower sensitivity but a slightly higher specificity compared to polysomnography. In order to increase performance and conclusiveness of peripheral arterial tone-based REM OSA phenotyping, a multi-night protocol of 2-5 nights could be considered. Finally, the minimum required rapid eye movement sleep time could be lowered from the conventional 30 min to 15 min without significantly lowering REM OSA phenotyping sensitivity and specificity, while increasing the level of phenotyping conclusiveness.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号