METHODS: Nine raters (two experts in image interpretation and preparation, three in image interpretation, and four in neither interpretation nor preparation) were asked to perform a segmentation of ten renal tumours (four cystic and six solid tumours). These segmentations were compared with a gold standard consensus segmentation generated using a previously validated algorithm.
RESULTS: Average sensitivity and positive predictive value (PPV) were 0.902 and 0.891, respectively. When assessing for variability between raters, significant differences were seen in the PPV, sensitivity and incursions and excursions from consensus tumour boundary.
CONCLUSIONS: This paper has demonstrated that the interpretation required for the segmentation of preoperative imaging of renal tumours introduces significant inconsistency and inaccuracy. Copyright © 2015 John Wiley & Sons, Ltd.