interobserver agreement

观察员间协定
  • 文章类型: Journal Article
    我们旨在比较放射科医生手动分割图像或自动分割辅助图像的性能和观察者之间的一致性。我们进一步旨在减少观察者间的变异性并提高影像组学特征的一致性。这项回顾性研究包括2016年9月至2018年6月诊断为前列腺癌的327例患者;来自228例患者的图像用于自动分割构建。其余99张照片用于测试。首先,4名具有不同经验水平的放射科医师使用T2加权脂肪抑制磁共振成像手动对99张轴向前列腺图像进行回顾性分割.2周后进行自动分割。Pyradiacomics软件包v3.1.0用于提取纹理特征。Dice系数和组内相关系数(ICC)用于评估前列腺影像组学的分割性能和观察者之间的一致性。Wilcoxon秩和检验用于比较配对样本,显著性水平设置为p<0.05。Dice系数用于精确测量手动描绘的图像的空间重叠。在所有99个前列腺分割结果列中,高级组的手动和自动分割结果明显优于初级组(p<0.05)。自动分割比手动分割更一致(p<0.05),平均ICC达到>0.85。初级放射科医生的自动分割注释性能类似于执行手动分割的高级放射科医生。影像组学特征的ICC增加到极好的一致性(0.925[0.888~0.950])。自动分割注释提供了比放射科医师手动分割更好的结果。我们的发现表明,自动分割注释有助于减少具有不同经验水平的放射科医师之间的感知和解释差异,并确保放射组学特征的稳定性。
    We aimed to compare the performance and interobserver agreement of radiologists manually segmenting images or those assisted by automatic segmentation. We further aimed to reduce interobserver variability and improve the consistency of radiomics features. This retrospective study included 327 patients diagnosed with prostate cancer from September 2016 to June 2018; images from 228 patients were used for automatic segmentation construction, and images from the remaining 99 were used for testing. First, four radiologists with varying experience levels retrospectively segmented 99 axial prostate images manually using T2-weighted fat-suppressed magnetic resonance imaging. Automatic segmentation was performed after 2 weeks. The Pyradiomics software package v3.1.0 was used to extract the texture features. The Dice coefficient and intraclass correlation coefficient (ICC) were used to evaluate segmentation performance and the interobserver consistency of prostate radiomics. The Wilcoxon rank sum test was used to compare the paired samples, with the significance level set at p < 0.05. The Dice coefficient was used to accurately measure the spatial overlap of manually delineated images. In all the 99 prostate segmentation result columns, the manual and automatic segmentation results of the senior group were significantly better than those of the junior group (p < 0.05). Automatic segmentation was more consistent than manual segmentation (p < 0.05), and the average ICC reached >0.85. The automatic segmentation annotation performance of junior radiologists was similar to that of senior radiologists performing manual segmentation. The ICC of radiomics features increased to excellent consistency (0.925 [0.888~0.950]). Automatic segmentation annotation provided better results than manual segmentation by radiologists. Our findings indicate that automatic segmentation annotation helps reduce variability in the perception and interpretation between radiologists with different experience levels and ensures the stability of radiomics features.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:早期发现和准确的病理评估对于改善胰腺癌的预后至关重要。EUS已广泛用于胰腺病变的诊断,并可通过内镜超声引导下细针穿刺(EUS-FNA)获得组织学诊断。然而,评估EUS-FNA标本的细胞病理学家之间的观察者间协议(IOA)的综合评估仍然有限.因此,这项研究评估了细胞病理学家对胰腺实性病变EUS-FNA标本的IOA,尤其是细胞学诊断假阴性病例,分析影响EUS-FNA细胞学诊断的因素,以提高EUS-FNA的诊断效率。
    方法:我们检索了2017年至2021年胰腺实性病变的EUS-FNA样本,并收集了其临床/细胞学数据。两名细胞病理学家使用引用的方法独立审查了这些病例,新的标准化细胞学评分工具。最终,我们计算了细胞病理学家的IOA,并进行了二元逻辑回归分析,以评估影响EUS-FNA细胞学诊断的因素.
    结果:纳入161例患者,60例临床诊断为胰腺癌,但细胞学诊断为良性和不典型构成假阴性组。整体患者和假阴性组的细胞学诊断IOA与Kendall的W值分别为0.896和0.462完全/中等。评分工具中的诊断细胞数量对于总体患者具有最高的一致性水平(κ=0.721)。对于所有病例和假阴性组,在其他数量和质量参数上充其量都有适度的一致性。Logistic回归分析显示,诊断细胞数(OR=6.110,p<0.05)和血液量(OR=0.320,p<0.05)可能影响细胞学诊断。
    结论:我们的研究假阴性率高达37.26%(60/161)主要与严格的细胞病理学家标准有关,他们标准化胰腺细胞学的能力仍在提高。细胞病理学家对细胞学诊断和诊断细胞数量的不理想一致性可能与假阴性诊断的发生有关。进一步的回归分析证实,诊断细胞数量和血液模糊是细胞学诊断的重要因素。因此,细胞学诊断标准的细化,标本质量评价的标准化,和细胞病理学家的培训可以提高细胞病理学家的一致性,从而提高细胞学诊断的可重复性,减少假阴性事件的发生。
    BACKGROUND: Early detection and accurate pathological assessment are critical to improving prognosis of pancreatic cancer. EUS has been widely used in diagnosing pancreatic lesions and can obtain histological diagnosis by endoscopic ultrasound-guided fine needle aspiration (EUS-FNA). However, comprehensive assessment of the interobserver agreement (IOA) among cytopathologists evaluating EUS-FNA specimens is still limited. Therefore, this study evaluated IOA among cytopathologists for EUS-FNA specimens of solid pancreatic lesions, especially in false-negative cases of cytological diagnosis and analyzed the factors that influence cytological diagnosis of EUS-FNA so as to improve the diagnostic efficiency of EUS-FNA.
    METHODS: We retrieved EUS-FNA samples of pancreatic solid lesions from 2017 to 2021 and collected their clinical/cytological data. Two cytopathologists independently reviewed these cases using a quoted, novel standardized cytology scoring tool. Ultimately, we calculated IOA among cytopathologists and performed a binary logistic regression analysis to evaluate factors influencing the cytological diagnosis of EUS-FNA.
    RESULTS: 161 patients were included, and 60 cases with a clinical diagnosis of pancreatic cancer but a cytological diagnosis of benign and atypical constituted the false-negative group. IOAs for cytological diagnosis of overall patients and the false-negative group were in perfect/moderate agreement with Kendall\'s W values of 0.896 and 0.462, respectively. The number of diagnostic cells in the scoring tool had the highest level of agreement (κ = 0.721) for overall patients. There was at best moderate agreement on other quantity and quality parameters for both all cases and false-negative group. Logistic regression analysis showed the number of diagnostic cells (OR = 6.110, p < 0.05) and amount of blood (OR = 0.320, p < 0.05) could influence cytological diagnosis.
    CONCLUSIONS: The false-negative rate of our study as high as 37.26% (60/161) is mainly related to strict standards of cytopathologists, and their ability to standardize pancreatic cytology is still improving. Suboptimal agreement among cytopathologists for cytological diagnosis and the number of diagnostic cells may be associated with the occurrence of false-negative diagnosis. Further regression analysis confirmed that the number of diagnostic cells and obscuring blood were important factors in cytological diagnosis. Therefore, refinement of cytological diagnostic criteria, standardization of specimen quality evaluation, and training of cytopathologists may improve the agreement of cytopathologists, thus improving the repeatability of cytological diagnosis and reducing the occurrence of false-negative events.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:由于对图像征象的解释不一致,用于预测肝细胞癌(HCC)患者微血管侵犯(MVI)的成像模型的可重复性仍然存在疑问。我们的目标是筛选高共识MRI特征,以开发可重复的模型来预测MVI。
    方法:我们纳入了219例接受手术切除的肝癌患者,患者被分为训练队列(n=145)和验证队列(n=74).形态特征,肝胆阶段的信号特征,和动态增强模式进行了观察者间的定性评估。观察者间一致性使用科恩κ进行评估,以选择观察者间一致性高的特征。在逐步多变量分析中显著并且可以用良好的观察者间协议来测量的风险因素被用于构建预测模型,在验证队列中进行了评估。基于受试者工作特征曲线下面积(AUC)评估模型的诊断性能。
    结果:多变量分析确定非平滑肿瘤边缘,没有放射性胶囊,肿瘤内动脉是MVI的独立危险因素。这些基于MRI的特征显示放射科医师之间良好或接近完美的观察者间一致性(κ>0.6)。预测模型在训练(AUC0.734)和验证队列(AUC0.759)中很好地预测MVI,并且很好地拟合校准曲线。
    结论:MRI特征包括非平滑肿瘤边缘,没有放射性胶囊,和肿瘤内动脉,可以评估与高观察者之间的一致性可以预测肝癌患者的MVI。这里描述的预测模型可能对放射科医生有用,不管经验水平如何。
    The reproducibility of imaging models for predicting microvascular invasion (MVI) in patients with hepatocellular carcinoma (HCC) remains questionable due to inconsistent interpretation of image signs. Our aim was to screen for high-consensus MRI features to develop a repeatable model for predicting MVI.
    We included 219 patients with HCC who underwent surgical resection, and patients were divided into a training cohort (n = 145) and a validation cohort (n = 74). Morphological characteristics, signal features on hepatobiliary phases, and dynamic enhancement patterns were qualitatively interobserver evaluated. Interobserver agreement was assessed using Cohen\'s κ for selecting features with high interobserver agreement. Risk factors that were significant in stepwise multivariate analysis and that could be measured with good interobserver agreement were used to construct a predictive model, which was assessed in the validation cohort. The diagnostic performance of the model was evaluated based on area under the receiver operating characteristic curve (AUC).
    Multivariate analysis identified nonsmooth tumor margin, absence of radiologic capsule, and intratumoral artery as independent risk factors of MVI. These MRI-based features showed good or nearly perfect interobserver agreement between radiologists (κ > 0.6). The predictive model predicted MVI well in the training (AUC 0.734) and validation cohorts (AUC 0.759) and fitted well to calibration curves.
    MRI features included nonsmooth tumor margin, absence of radiologic capsule, and intratumoral artery that can be assessed with high interobserver agreement can predict MVI in HCC patients. The predictive model described here may be useful to radiologists, regardless of experience level.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    本研究旨在建立基于CT的距骨外侧突骨折综合分类系统,并评估其预后价值,可靠性和再现性。我们回顾性分析了42例涉及LPTF的患者,平均随访35.9个月,以进行临床和影像学评估。为了制定全面的分类,一组经验丰富的骨科医生讨论了这些病例。所有骨折都是根据霍金斯分类的,McCrory-Bladin和6名观察员提出的新分类。使用kappa统计量测量了观察者之间和观察者之间协议的分析。新的分类包括两种类型,基于是否存在伴随损伤,I型由3个亚型组成,II型由5个亚型组成。新分类Ia型的平均AOFAS评分为91.5,86在Ib型中,90.5在Ic型中,IIa型89,IIb型中的76.7,IIc型中的76.6,IId型91.3,和83.5在IIe型中。新分类系统的观察者间和观察者内可靠性几乎是完美的(κ分别为0.776和0.837),与Hawkins分类(分别为κ0.572和0.649)以及McCrory-Bladin分类(分别为κ=0.582和0.685)相比,观察者间和观察者间的可靠性更高。新的分类系统是一个全面的系统,考虑了伴随的损伤,并显示出良好的预后价值与临床结果。它更可靠和可重复,可以成为LPTF治疗方案决策的有用工具。
    This study aimed to develop a comprehensive classification system for fractures of the lateral process of the talus (LPTF) based on CT, and to evaluate its prognostic value, reliability and reproducibility. We retrospectively reviewed 42 patients involving LPTF with an average follow-up of 35.9 months for clinical and radiographic evaluations. In order to develop a comprehensive classification, a panel of experienced orthopedic surgeons discussed the cases. All fractures were classified according to Hawkins, McCrory-Bladin and new proposed classifications by 6 observers. The analysis of interobserver and intraobserver agreements was measured using kappa statistics. The new classification included 2 types based on presence of concomitant injuries or not, with type I consisting of 3 subtypes and type II of 5 subtypes. Average AOFAS score was 91.5 in the type Ia of new classification, 86 in type Ib, 90.5 in type Ic, 89 in type IIa, 76.7 in type IIb, 76.6 in type IIc, 91.3 in type IId, and 83.5 in type IIe. Interobserver and intraobserver reliability of the new classification system were almost perfect (κ = 0.776 and 0.837, respectively), showing a higher interobserver and intraobserver reliability compared to the Hawkins classification (κ 0.572 and 0.649, respectively) as well as McCrory-Bladin classification (κ = 0.582 and 0.685, respectively). The new classification system is a comprehensive one that takes into account concomitant injuries and shows good prognostic value with clinical outcomes. It is more reliable and reproducible and could be a useful tool for decision-making on treatment options for LPTF.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Multicenter Study
    目的:锁骨远端骨折分型直接影响治疗决策。目前尚不清楚实施的分类系统是否因外科医生的背景而异。这项研究旨在比较肩关节专家和普通创伤外科医师用于锁骨外侧骨折的四种分类系统的观察者共识。
    方法:由来自10家不同医院的8名经验丰富的肩关节专家和8名普通创伤外科医师分析了20例锁骨外侧骨折的X线照片,这些照片代表了完整的成人骨折类型。所有病例均根据骨科创伤协会(OTA)进行分级,Neer,Jäger/Breitner,和贡吉分类系统。为了衡量观察员协议,应用并评估了弗莱斯卡帕系数(κ)。
    结果:当只提供X射线胶片时,双方达成了公平的协议。然而,当提供3D-CT扫描图像时,当OTA,Jäger/Breitner,并使用了贡吉分类系统。在通才团体中,在使用贡吉分类系统时发现了改进的一致性。就观察者间的可靠性而言,OTA,Neer,Jäger/Breitner分类系统在肩部专家之间显示出更好的一致性,而使用贡基分类系统发现的协议水平略低。对于OTA分类系统,观察者共识的平均kappa值为0.418,范围为0.446(专家组)至0.402(通才组).对于Neer分类系统,观察者共识的平均kappa值为0.368,范围为0.402(专家组)至0.390(通才组).对于Jäger/Breitner分类系统,观察者间协议的平均kappa值为0.380,范围为0.413(专家组)至0.404(通才组).对于贡吉分类系统,观察者间协议的平均kappa值为0.455,范围为0.480(专科组)至0.485(通才组).
    结论:一般来说,3D-CT扫描提供了更丰富的经验,可以在大多数锁骨外侧骨折分类系统中获得更好的结果,突出数字化和专业化在诊断和治疗中的价值。使用贡吉分类系统在通才小组中展示了竞争性的观察员协议,这表明Gongji分类适用于在肩部领域经验不丰富的普通创伤外科医生。
    OBJECTIVE: Distal clavicle fracture classification directly affects the treatment decisions. It is unclear whether the classification systems implemented differ depending on surgeons\' backgrounds. This study aimed to compare the interobserver agreement of four classification systems used for lateral clavicle fractures by shoulder specialists and general trauma surgeons.
    METHODS: Radiographs of 20 lateral clavicle fractures representing a full spectrum of adult fracture patterns were analyzed by eight experienced shoulder specialists and eight general trauma surgeons from 10 different hospitals. All cases were graded according to the Orthopedic Trauma Association (OTA), Neer, Jäger/Breitner, and Gongji classification systems. To measure observer agreement, Fleiss\' kappa coefficient (κ) was applied and assessed.
    RESULTS: When only X-ray films were presented, both groups achieved fair agreement. However, when the 3D-CT scan images were provided, improved interobserver agreement was found in the specialist group when the OTA, Jäger/Breitner, and Gongji classification systems were used. In the generalist groups, improved agreement was found when using the Gongji classification system. In terms of interobserver reliability, the OTA, Neer, and Jäger/Breitner classification systems showed better agreement among shoulder specialists, while a slightly lower level of agreement was found using the Gongji classification system. For the OTA classification system, interobserver agreement had a mean kappa value of 0.418, ranging from 0.446 (specialist group) to 0.402 (generalist group). For the Neer classification system, interobserver agreement had a mean kappa value of 0.368, ranging from 0.402 (specialist group) to 0.390 (generalist group). For the Jäger/Breitner classification system, the inter-observer agreement had a mean kappa value of 0.380, ranging from 0.413 (specialist group) to 0.404 (generalist group). For the Gongji classification system, interobserver agreement had a mean kappa value of 0.455, ranging from 0.480 (specialist group) to 0.485 (generalist group).
    CONCLUSIONS: Generally speaking, 3D-CT scans provide a richer experience that can lead to better results in most classification systems of lateral clavicle fractures, highlighting the value of digitization and specialization in diagnosis and treatment. Competitive interobserver agreement was exhibited in the generalist group using the Gongji classification system, suggesting that the Gongji classification is suitable for general trauma surgeons who are not highly experienced in the shoulder field.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:比较程序性细胞死亡配体1(PD-L1)在同一三阴性乳腺癌(TNBC)标本的不同石蜡块中以及匹配的原发肿瘤和淋巴结转移瘤(LNMets)之间的表达。我们还旨在确定接受PD-L1(SP142)分析培训的病理学家在评估TNBC中的观察者之间的共识。
    方法:426例组织学上确定的TNBC病例,其中85个有LNMets,包括在这项研究中。由两名受过训练的病理学家使用PD-L1(SP142)测定来鉴定原发性肿瘤和TNBC的LNMets中的肿瘤浸润性免疫细胞(IC)以及肿瘤细胞(TC)上的PD-L1表达。PD-L1评分和评估基于IMpassion130试验标准中的标准。使用Kappa检验分析TNBC中PD-L1表达的一致性并通过Kappa值评估。
    结果:在LNMets(49.4%)中,肿瘤中免疫细胞(PD-L1IC+)(IC≥1%)的阳性PD-L1表达(PD-L1+)的患病率高于匹配的原发性肿瘤(38.9%)。来自同一原发肿瘤标本的两个石蜡块之间在IC上的PD-L1表达的一致性是实质性的(P<0.000,Kappa=0.627),并且在选定病例的83.1%(108/130)中被鉴定。对于具有匹配的主要和LNMets区块的TNBC病例,两组间PD-L1IC评分的一致性中等(P<0.000,Kappa=0.434).在原发性肿瘤中,观察者对PD-L1评估的共识为78.2%(P<0.000,Kappa=0.567),在匹配的LNet中,观察者对PD-L1评估的共识为61.4%(P<0.000,Kappa=0.253)。
    结论:确定了TNBC患者原发肿瘤的PD-L1评分的实质肿瘤内一致性,这意味着使用一个代表性的原发肿瘤块进行免疫组织化学检测应该足以在临床实践中确定PD-L1的表达状态。淋巴结转移(LNMets)中PD-L1+的患病率高于匹配的原发肿瘤,这意味着LNMets中的PD-L1检测可以提供额外的PD-L1表达信息,特别是在匹配的原发性乳腺肿瘤中具有PD-L1-的TNBC病例中。原发性肿瘤中PD-L1评分的观察者共识中等,而在LNMets中仅公平,这意味着建议对TNBCLNMets标本进行PD-L1评估的额外培训,以增强观察者之间的共识。
    方法:当前研究中使用和/或分析的数据集可根据相应的作者的合理要求获得。
    OBJECTIVE: To compare the expression of programmed cell death ligand 1 (PD-L1) in different paraffin blocks from the same triple-negative breast cancers (TNBC) specimen and between matched primary tumors and lymph node metastases (LNMets). We also aim to determine the interobserver agreement between pathologists trained on PD-L1 (SP142) assay in assessing TNBC.
    METHODS: 426 histologically confirmed TNBC cases, in which 85 have LNMets, were included in this study. A PD-L1 (SP142) assay was used to identify PD-L1 expression on tumor infiltrating immune cells (IC) and also on tumor cells (TC) in primary tumors and LNMets of TNBC by two trained pathologists. PD-L1 scoring and assessment were based on criteria in IMpassion 130 trial criteria. Concordance of PD-L1 expression in TNBC were analyzed using Kappa-test and assessed by the Kappa value.
    RESULTS: Prevalence of positive PD-L1 expression (PD-L1 +) on tumor-infiltrating immune cells (PD-L1 IC+) (IC≥1%) in LNMets (49.4%) was higher than in the matched primary tumors (38.9%). Concordance of PD-L1 expression on IC between the two paraffin blocks from the same primary tumor specimen was substantial (P < 0.000, Kappa = 0.627) and was identified in 83.1% (108/130) of the selected cases. For TNBC cases with matched primary and LNMets blocks, the concordance of PD-L1IC scoring between the two blocks was moderate (P < 0.000, Kappa = 0.434). Interobserver agreement of PD-L1 assessment was 78.2% (P < 0.000, Kappa = 0.567) in primary tumors and 61.4% (P < 0.000, Kappa = 0.253) in the matched LNets.
    CONCLUSIONS: Substantial intratumor concordance of PD-L1 scoring of the primary tumors in TNBC patients was determined, implying that immunohistochemically detection using one representative block of the primary tumor should be enough to assign the expression status of PD-L1 in clinical practice. The prevalence of PD-L1 + in lymph node metastases (LNMets) was higher than in the matched primary tumors, implying that PD-L1 detection in LNMets may provide additional PD-L1 expression information, especially in TNBC cases with PD-L1- in the matched primary breast tumors. Interobserver agreement of PD-L1 scoring in primary tumors was moderate while only fair in LNMets, implying that the additional training for PD-L1 assessment of TNBC LNMets specimens is recommended to enhance interobserver agreement.
    METHODS: The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    为了验证血管严重程度评分作为人工智能(AI)软件作为早产儿视网膜病变(ROP)的医疗设备(SaMD)的适当输出,通过与国际早产儿视网膜病变分类指定的分期和附加疾病的顺序疾病严重程度标签进行比较,第三版(ICROP3)委员会。
    基于AI的ROP血管严重度评分的验证研究。
    ICROP3委员会共有34位ROP专家。
    两个单独的30张眼底照片的数据集,每个数据集用于阶段(0-5)和+疾病(+,preplus,两者都没有)由ICROP3委员会成员使用开源平台进行标记。平均这些结果为每个图像产生正(1-9)和阶段(1-3)的连续标签。专家们还被要求根据加病的相对严重程度相互比较每个图像。每张图像还标记了ROP深度学习系统中成像和信息学的血管严重程度评分,将其与每个分级者的诊断标签进行相关性比较,以及检眼镜的诊断阶段。
    在每对分级者分类标签之间计算阶段和+疾病的加权κ和皮尔逊相关系数(CC)。Elo算法还用于将每个专家的成对比较转换成从最不严重到最严重的有序图像集。
    所有加疾病图像比较的观察者间对的平均加权κ和CC分别为0.67和0.88。在所有专家中,发现血管严重度评分与平均疾病分类(CC=0.90,P<0.001)和检眼镜诊断分期(通过方差分析P<0.001)高度相关。
    ROP血管严重程度评分与国际早产儿视网膜病分类委员会成员的加病和分期标签密切相关,具有显著的渐变体变异性。为ROPSaMD的经过验证的评分系统建立共识可以促进这些技术的全球创新和监管授权。
    To validate a vascular severity score as an appropriate output for artificial intelligence (AI) Software as a Medical Device (SaMD) for retinopathy of prematurity (ROP) through comparison with ordinal disease severity labels for stage and plus disease assigned by the International Classification of Retinopathy of Prematurity, Third Edition (ICROP3), committee.
    Validation study of an AI-based ROP vascular severity score.
    A total of 34 ROP experts from the ICROP3 committee.
    Two separate datasets of 30 fundus photographs each for stage (0-5) and plus disease (plus, preplus, neither) were labeled by members of the ICROP3 committee using an open-source platform. Averaging these results produced a continuous label for plus (1-9) and stage (1-3) for each image. Experts were also asked to compare each image to each other in terms of relative severity for plus disease. Each image was also labeled with a vascular severity score from the Imaging and Informatics in ROP deep learning system, which was compared with each grader\'s diagnostic labels for correlation, as well as the ophthalmoscopic diagnosis of stage.
    Weighted kappa and Pearson correlation coefficients (CCs) were calculated between each pair of grader classification labels for stage and plus disease. The Elo algorithm was also used to convert pairwise comparisons for each expert into an ordered set of images from least to most severe.
    The mean weighted kappa and CC for all interobserver pairs for plus disease image comparison were 0.67 and 0.88, respectively. The vascular severity score was found to be highly correlated with both the average plus disease classification (CC = 0.90, P < 0.001) and the ophthalmoscopic diagnosis of stage (P < 0.001 by analysis of variance) among all experts.
    The ROP vascular severity score correlates well with the International Classification of Retinopathy of Prematurity committee member\'s labels for plus disease and stage, which had significant intergrader variability. Generation of a consensus for a validated scoring system for ROP SaMD can facilitate global innovation and regulatory authorization of these technologies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    OBJECTIVE: The purpose of this study was to quantitatively assess the longitudinal acquisition repeatability of MRI radiomics features in a three-dimensional (3D) T1-weighted (T1W) TSE sequence via a well-controlled prospective phantom study.
    METHODS: Thirty consecutive daily datasets of an ACR-MRI phantom were acquired on two 1.5T MRI simulators using a 3D T1W TSE sequence. Images were blindly segmented by two observers. Post-acquisition processing was minimized but an intensity discretization (fixed bin size of 25). One hundred and one radiomics features (shape n = 12; first order n = 16; texture n = 73) were extracted. Longitudinal repeatability of each feature was evaluated by Pearson correlation and coefficient of variance (CV68% ). Interobserver feature value agreement was also quantified using intraclass correlation coefficient (ICC) and Bland-Altman analysis. A most repeatable radiomics feature set on both scanners was determined by feature coefficient of variance (CV68% <5%), ICC (>0.75), and the ratio of the interobserver difference to the interobserver mean δ<5%.
    RESULTS: No trend of radiomics feature value changed with time. Longitudinal feature repeatability CV68% ranged 0.01-38.60% (mean/median: 12.5%/9.9%), and 0.01-40.47%, (8.49%/7.34%) on the scanners A and B. Shape features exhibited significantly better repeatability than first-order and texture features (all P < 0.01). Significant longitudinal repeatability difference was observed in texture features (P < 0.001) between the two scanners, but not in shape and first-order features (P > 0.30). First-order and texture features had smaller interobserver-dependent variation than acquisition-dependent variation. They also showed good interobserver agreement on both scanners (A:ICC = 0.80 ± 0.23; B:ICC = 0.80 ± 0.22), independent of acquisition repeatability. The repeatable radiomics features in common on both scanners, including 12 shape features, 0 first-order features, and 3 texture features, were determined as the most repeatable MRI radiomics feature set.
    CONCLUSIONS: Radiomics features exhibited heterogeneous longitudinal repeatability, while the shape features were the most repeatable, in this phantom study with a 3D T1W TSE acquisition. The most repeatable radiomics feature set derived in this study should be helpful for the selection of reliable radiomics features in the future clinical use.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    背景:肝细胞癌是最常见的原发性肝脏恶性肿瘤。从以前的研究结果来看,肝脏影像报告和数据系统(LI-RADS)在超声造影(CEUS)上显示出令人满意的诊断价值。然而,关于这种创新的超声成像的观察者间稳定性的统一结论尚未确定。本荟萃分析考察了CEUSLI-RADS的观察者间一致性,为后续相关研究提供参考。
    目的:评估LI-RADS对CEUS的观察者间一致性,并分析研究之间异质性的来源。
    方法:分析了2020年3月1日前在中国和其他国家发表的关于CEUSLI-RADS观察员协议的相关论文。这些研究经过过滤,并对诊断标准进行评价。使用R软件版本3.6.2的\"meta\"和\"metafor\"软件包分析所选择的参考文献。
    结果:本分析最终纳入了8项研究。Meta分析结果显示,纳入研究的Kappa值汇总为0.76[95%置信区间,0.67-0.83],这表明了实质性的协议。希金斯I2统计也证实了实质性的异质性(I2=91.30%,95%置信区间,85.3%-94.9%,P<0.01)。元回归确定了变量,包括患者登记的方法,一致性测试方法,和耐心的种族,这解释了研究的实质性异质性。
    结论:CEUSLI-RADS展示了总体上实质性的观察者间协议,但研究之间的异质性结果也很明显。进一步的临床研究应考虑有关实验设计的修改建议。
    BACKGROUND: Hepatocellular carcinoma is the most common primary liver malignancy. From the results of previous studies, Liver Imaging Reporting and Data System (LI-RADS) on contrast-enhanced ultrasound (CEUS) has shown satisfactory diagnostic value. However, a unified conclusion on the interobserver stability of this innovative ultrasound imaging has not been determined. The present meta-analysis examined the interobserver agreement of CEUS LI-RADS to provide some reference for subsequent related research.
    OBJECTIVE: To evaluate the interobserver agreement of LI-RADS on CEUS and analyze the sources of heterogeneity between studies.
    METHODS: Relevant papers on the subject of interobserver agreement on CEUS LI-RADS published before March 1, 2020 in China and other countries were analyzed. The studies were filtered, and the diagnostic criteria were evaluated. The selected references were analyzed using the \"meta\" and \"metafor\" packages of R software version 3.6.2.
    RESULTS: Eight studies were ultimately included in the present analysis. Meta-analysis results revealed that the summary Kappa value of included studies was 0.76 [95% confidence interval, 0.67-0.83], which shows substantial agreement. Higgins I 2 statistics also confirmed the substantial heterogeneity (I 2 = 91.30%, 95% confidence interval, 85.3%-94.9%, P < 0.01). Meta-regression identified the variables, including the method of patient enrollment, method of consistency testing, and patient race, which explained the substantial study heterogeneity.
    CONCLUSIONS: CEUS LI-RADS demonstrated overall substantial interobserver agreement, but heterogeneous results between studies were also obvious. Further clinical investigations should consider a modified recommendation about the experimental design.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    BACKGROUND: The evaluation of PD-L1 expression in nonsmall cell lung carcinoma (NSCLC) is becoming increasingly important given the effectiveness of PD-L1 inhibitors. Although cytologic specimens have been shown to be compatible with surgical specimens to evaluate PD-L1 immunohistochemistry (IHC), evidence of the reproducibility of PD-L1 in cytologic specimens is lacking. The aim of this study is to evaluate interobserver agreement in PD-L1 IHC in cytologic specimens.
    METHODS: PD-L1 IHC was performed on 86 NSCLC cytology specimens using Dako PD-L1 IHC 22C3 pharmDx. The digitally scanned whole slide images (WSI) were read by five pathologists. Each case was given a Tumor Proportion Score (TPS) and the results were compared between the observers. The interobserver concordance was assessed using 1% and 50% as cutoffs.
    RESULTS: TPSs were highly correlated among observers (Spearman correlation coefficient, 0.86-0.94). Using greater than 1% as a cutoff, interobserver agreement measured by Fleiss Kappa was 0.74 for all pathologists and Cohen\'s Kappa coefficient ranged from 0.49 to 0.83, consistent with moderate to substantial agreement. With a cutoff of greater than 50%, Fleiss Kappa was 0.79 for all pathologists and the kappa values ranged from 0.63 to 0.90, consistent with substantial to almost perfect agreement. Several pitfalls were identified by reviewing discordant cases, including staining in macrophages, stromal cells, and intratumoral heterogeneity.
    CONCLUSIONS: Our data suggest that TPS of PD-L1 IHC on cytology specimens is reproducible, with a better agreement when using 50% as the cutoff value. However, special attention is required when the TPS is near the 1% cutoff.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号