关键词: Artificial intelligence Computed tomography (CT) Pancreas Segmentation

来  源:   DOI:10.1016/j.acra.2024.06.015

Abstract:
OBJECTIVE: Pancreas segmentation accuracy at CT is critical for the identification of pancreatic pathologies and is essential for the development of imaging biomarkers. Our objective was to benchmark the performance of five high-performing pancreas segmentation models across multiple metrics stratified by scan and patient/pancreatic characteristics that may affect segmentation performance.
METHODS: In this retrospective study, PubMed and ArXiv searches were conducted to identify pancreas segmentation models which were then evaluated on a set of annotated imaging datasets. Results (Dice score, Hausdorff distance [HD], average surface distance [ASD]) were stratified by contrast status and quartiles of peri-pancreatic attenuation (5 mm region around pancreas). Multivariate regression was performed to identify imaging characteristics and biomarkers (n = 9) that were significantly associated with Dice score.
RESULTS: Five pancreas segmentation models were identified: Abdomen Atlas [AAUNet, AASwin, trained on 8448 scans], TotalSegmentator [TS, 1204 scans], nnUNetv1 [MSD-nnUNet, 282 scans], and a U-Net based model for predicting diabetes [DM-UNet, 427 scans]. These were evaluated on 352 CT scans (30 females, 25 males, 297 sex unknown; age 58 ± 7 years [ ± 1 SD], 327 age unknown) from 2000-2023. Overall, TS, AAUNet, and AASwin were the best performers, Dice= 80 ± 11%, 79 ± 16%, and 77 ± 18%, respectively (pairwise Sidak test not-significantly different). AASwin and MSD-nnUNet performed worse (for all metrics) on non-contrast scans (vs contrast, P < .001). The worst performer was DM-UNet (Dice=67 ± 16%). All algorithms except TS showed lower Dice scores with increasing peri-pancreatic attenuation (P < .01). Multivariate regression showed non-contrast scans, (P < .001; MSD-nnUNet), smaller pancreatic length (P = .005, MSD-nnUNet), and height (P = .003, DM-UNet) were associated with lower Dice scores.
CONCLUSIONS: The convolutional neural network-based models trained on a diverse set of scans performed best (TS, AAUnet, and AASwin). TS performed equivalently to AAUnet and AASwin with only 13% of the training set size (8488 vs 1204 scans). Though trained on the same dataset, a transformer network (AASwin) had poorer performance on non-contrast scans whereas its convolutional network counterpart (AAUNet) did not. This study highlights how aggregate assessment metrics of pancreatic segmentation algorithms seen in other literature are not enough to capture differential performance across common patient and scanning characteristics in clinical populations.
摘要:
目的:CT胰腺分割的准确性对于胰腺病变的识别至关重要,并且对于成像生物标志物的开发至关重要。我们的目标是对五种高性能胰腺分割模型的性能进行基准测试,这些模型通过扫描和可能影响分割性能的患者/胰腺特征进行分层。
方法:在这项回顾性研究中,进行PubMed和ArXiv搜索以识别胰腺分割模型,然后在一组注释的成像数据集上对其进行评估。结果(骰子得分,Hausdorff距离[HD],平均表面距离[ASD])通过对比状态和胰周衰减的四分位数(胰腺周围5mm区域)分层。进行多变量回归以鉴定与Dice评分显著相关的成像特征和生物标志物(n=9)。
结果:确定了五个胰腺分割模型:腹部地图集[AAUNet,AASwin,在8448次扫描中训练],TotalSegmentator[TS,1204扫描],nnUNetv1[MSD-nnUNet,282扫描],和基于U-Net的糖尿病预测模型[DM-UNet,427扫描]。在352次CT扫描中对这些进行了评估(30位女性,25名男性,297个性别未知;年龄58±7岁[±1标准差],327年龄未知),从2000年至2023年。总的来说,TS,AAUNET,AASwin是表现最好的,骰子=80±11%,79±16%,77±18%,分别(成对Sidak检验无显着差异)。AASwin和MSD-nnUNet在非造影扫描中表现更差(对于所有指标)(与造影相比,P<.001)。表现最差的是DM-UNet(Dice=67±16%)。除TS外,所有算法均显示出随胰腺周围衰减增加而降低的Dice评分(P<0.01)。多元回归显示非造影扫描,(P<.001;MSD-nnUNet),较小的胰腺长度(P=0.005,MSD-nnUNet),身高(P=0.003,DM-UNet)与较低的Dice评分相关。
结论:在不同扫描集上训练的基于卷积神经网络的模型表现最好(TS,AAUnet,和AASwin)。TS与AAUnet和AASwin等效地执行,仅训练集大小的13%(8488对1204扫描)。虽然在同一个数据集上训练,变压器网络(AASwin)在非对比扫描中的性能较差,而其卷积网络对应物(AAUNet)则没有。本研究强调了在其他文献中看到的胰腺分割算法的汇总评估指标如何不足以捕获临床人群中常见患者和扫描特征的差异性能。
公众号