关键词: Gleason grading artificial intelligence prostate cancer radical prostatectomy whole‐mount prostatectomy

来  源:   DOI:10.1111/bju.16464

Abstract:
OBJECTIVE: To externally validate the performance of the DeepDx Prostate artificial intelligence (AI) algorithm (Deep Bio Inc., Seoul, South Korea) for Gleason grading on whole-mount prostate histopathology, considering potential variations observed when applying AI models trained on biopsy samples to radical prostatectomy (RP) specimens due to inherent differences in tissue representation and sample size.
METHODS: The commercially available DeepDx Prostate AI algorithm is an automated Gleason grading system that was previously trained using 1133 prostate core biopsy images and validated on 700 biopsy images from two institutions. We assessed the AI algorithm\'s performance, which outputs Gleason patterns (3, 4, or 5), on 500 1-mm2 tiles created from 150 whole-mount RP specimens from a third institution. These patterns were then grouped into grade groups (GGs) for comparison with expert pathologist assessments. The reference standard was the International Society of Urological Pathology GG as established by two experienced uropathologists with a third expert to adjudicate discordant cases. We defined the main metric as the agreement with the reference standard, using Cohen\'s kappa.
RESULTS: The agreement between the two experienced pathologists in determining GGs at the tile level had a quadratically weighted Cohen\'s kappa of 0.94. The agreement between the AI algorithm and the reference standard in differentiating cancerous vs non-cancerous tissue had an unweighted Cohen\'s kappa of 0.91. Additionally, the AI algorithm\'s agreement with the reference standard in classifying tiles into GGs had a quadratically weighted Cohen\'s kappa of 0.89. In distinguishing cancerous vs non-cancerous tissue, the AI algorithm achieved a sensitivity of 0.997 and specificity of 0.88; in classifying GG ≥2 vs GG 1 and non-cancerous tissue, it demonstrated a sensitivity of 0.98 and specificity of 0.85.
CONCLUSIONS: The DeepDx Prostate AI algorithm had excellent agreement with expert uropathologists and performance in cancer identification and grading on RP specimens, despite being trained on biopsy specimens from an entirely different patient population.
摘要:
目的:从外部验证DeepDx前列腺人工智能(AI)算法的性能(DeepBioInc.,首尔,韩国)用于整体前列腺组织病理学的格里森分级,考虑将针对活检样本进行训练的AI模型应用于根治性前列腺切除术(RP)标本时观察到的潜在差异,这是由于组织代表性和样本量的固有差异。
方法:商用DeepDx前列腺AI算法是一种自动Gleason分级系统,先前使用1133个前列腺核心活检图像进行了训练,并在来自两个机构的700个活检图像上进行了验证。我们评估了AI算法的性能,输出格里森模式(3、4或5),在500个1-mm2瓷砖上,这些瓷砖是由来自第三机构的150个完整安装的RP标本制成的。然后将这些模式分为等级组(GG),以与专家病理学家评估进行比较。参考标准是由两名经验丰富的泌尿病理学家和第三位专家确定的国际泌尿病理学GG学会裁定不和谐病例。我们将主要指标定义为与参考标准的一致性,使用科恩的卡帕。
结果:两位经验丰富的病理学家在平铺水平确定GGs时的一致性具有二次加权Cohen的kappa为0.94。AI算法与参考标准在区分癌性组织与非癌性组织方面的一致性具有0.91的未加权Cohen'sκ。此外,在将图块分类为GGs时,AI算法与参考标准的一致性具有二次加权Cohen的kappa为0.89。在区分癌性组织和非癌性组织时,AI算法的灵敏度为0.997,特异性为0.88;在对GG≥2与GG1和非癌组织进行分类时,其灵敏度为0.98,特异性为0.85.
结论:DeepDx前列腺AI算法与泌尿系病理学家专家具有极好的一致性,并且在RP标本的癌症识别和分级方面具有出色的性能,尽管接受了来自完全不同患者人群的活检标本的培训。
公众号