Mesh : Humans Bayes Theorem Benchmarking / methods Radiation Oncologists Female Radiotherapy Planning, Computer-Assisted / methods Neoplasms / epidemiology radiotherapy Organs at Risk Male Radiation Oncology / standards methods Demography Observer Variation

来  源:   DOI:10.1200/CCI.23.00174   PDF(Pubmed)

Abstract:
OBJECTIVE: The quality of radiotherapy auto-segmentation training data, primarily derived from clinician observers, is of utmost importance. However, the factors influencing the quality of clinician-derived segmentations are poorly understood; our study aims to quantify these factors.
METHODS: Organ at risk (OAR) and tumor-related segmentations provided by radiation oncologists from the Contouring Collaborative for Consensus in Radiation Oncology data set were used. Segmentations were derived from five disease sites: breast, sarcoma, head and neck (H&N), gynecologic (GYN), and GI. Segmentation quality was determined on a structure-by-structure basis by comparing the observer segmentations with an expert-derived consensus, which served as a reference standard benchmark. The Dice similarity coefficient (DSC) was primarily used as a metric for the comparisons. DSC was stratified into binary groups on the basis of structure-specific expert-derived interobserver variability (IOV) cutoffs. Generalized linear mixed-effects models using Bayesian estimation were used to investigate the association between demographic variables and the binarized DSC for each disease site. Variables with a highest density interval excluding zero were considered to substantially affect the outcome measure.
RESULTS: Five hundred seventy-four, 110, 452, 112, and 48 segmentations were used for the breast, sarcoma, H&N, GYN, and GI cases, respectively. The median percentage of segmentations that crossed the expert DSC IOV cutoff when stratified by structure type was 55% and 31% for OARs and tumors, respectively. Regression analysis revealed that the structure being tumor-related had a substantial negative impact on binarized DSC for the breast, sarcoma, H&N, and GI cases. There were no recurring relationships between segmentation quality and demographic variables across the cases, with most variables demonstrating large standard deviations.
CONCLUSIONS: Our study highlights substantial uncertainty surrounding conventionally presumed factors influencing segmentation quality relative to benchmarks.
摘要:
目的:放射治疗自动分割训练数据的质量,主要来自临床医生观察员,是最重要的。然而,影响临床医生衍生分割质量的因素知之甚少;我们的研究旨在量化这些因素。
方法:使用由放射肿瘤学数据集的放射肿瘤学家提供的处于危险中的器官(OAR)和肿瘤相关的部分。分割来自五个疾病部位:乳房,肉瘤,头颈部(H&N)妇科(GYN),和GI。通过将观察者分割与专家得出的共识进行比较,在逐个结构的基础上确定分割质量。作为参考标准基准。Dice相似性系数(DSC)主要用作比较的度量。根据结构特定的专家衍生的观察者间变异性(IOV)截止值,将DSC分为二元组。使用贝叶斯估计的广义线性混合效应模型用于研究每个疾病部位的人口统计学变量与二值化DSC之间的关联。具有排除零的最高密度区间的变量被认为对结果测量有很大影响。
结果:五百七十四,110、452、112和48个分割用于乳房,肉瘤,H&N,GYN,和胃肠道病例,分别。当按结构类型分层时,OAR和肿瘤的分割超过专家DSCIOV截止值的中位数百分比分别为55%和31%,分别。回归分析显示,与肿瘤相关的结构对乳腺二值化DSC有很大的负面影响,肉瘤,H&N,和GI病例。在不同的案例中,细分质量和人口统计学变量之间没有反复出现的关系,大多数变量表现出较大的标准偏差。
结论:我们的研究强调了相对于基准而言影响分割质量的传统假定因素的大量不确定性。
公众号