从定量指标到临床成功：评估深度学习在乳腺手术中肿瘤分割的实用性。From quantitative metrics to clinical success: assessing the utility of deep learning for tumor segmentation in breast surgery.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

OBJECTIVE: Preventing positive margins is essential for ensuring favorable patient outcomes following breast-conserving surgery (BCS). Deep learning has the potential to enable this by automatically contouring the tumor and guiding resection in real time. However, evaluation of such models with respect to pathology outcomes is necessary for their successful translation into clinical practice.
METHODS: Sixteen deep learning models based on established architectures in the literature are trained on 7318 ultrasound images from 33 patients. Models are ranked by an expert based on their contours generated from images in our test set. Generated contours from each model are also analyzed using recorded cautery trajectories of five navigated BCS cases to predict margin status. Predicted margins are compared with pathology reports.
RESULTS: The best-performing model using both quantitative evaluation and our visual ranking framework achieved a mean Dice score of 0.959. Quantitative metrics are positively associated with expert visual rankings. However, the predictive value of generated contours was limited with a sensitivity of 0.750 and a specificity of 0.433 when tested against pathology reports.
CONCLUSIONS: We present a clinical evaluation of deep learning models trained for intraoperative tumor segmentation in breast-conserving surgery. We demonstrate that automatic contouring is limited in predicting pathology margins despite achieving high performance on quantitative metrics.

摘要：

目的：防止切缘阳性对于确保保乳手术（BCS）后患者的良好预后至关重要。深度学习有可能通过自动描绘肿瘤轮廓并实时指导切除来实现这一目标。然而,在病理学结果方面评估此类模型对于将其成功转化为临床实践是必要的。
方法：基于文献中已建立的架构的16个深度学习模型在来自33名患者的7318个超声图像上进行了训练。模型由专家根据从我们测试集中的图像生成的轮廓进行排名。还使用五个导航BCS案例的记录烧灼轨迹来分析从每个模型生成的轮廓，以预测边缘状态。将预测的切缘与病理报告进行比较。
结果：使用定量评估和我们的视觉排名框架的最佳性能模型获得了0.959的平均Dice评分。定量指标与专家视觉排名呈正相关。然而,当对照病理学报告进行测试时,生成的轮廓的预测值有限,其敏感性为0.750,特异性为0.433.
结论：我们提出了一项针对保乳手术术中肿瘤分割训练的深度学习模型的临床评估。我们证明，尽管在定量指标上实现了高性能，但自动轮廓在预测病理边缘方面受到限制。