背景:人工智能(AI)越来越多地应用于病理学和细胞学,显示出有希望的结果。我们收集了甲状腺细针穿刺细胞学(FNA)的整个幻灯片图像的大数据集,结合z-堆叠,从全国各地的机构开发人工智能模型。
方法:我们使用来自OpenAIDatasetProject的甲状腺FNA数据集进行了一项多中心回顾性诊断准确性研究,该数据集包括从三家大学医院和215个韩国机构收集的数字化图像样本,在病例选择过程中进行了广泛的质量检查。扫描,标签,和审查过程。使用三种不同的扫描仪捕获多个z层图像,并从整个幻灯片图像中提取图像块,并在聚焦融合和颜色归一化后调整大小。我们预先测试了六个人工智能模型,使用数据集的子集确定InceptionResNetv2为最佳模型,并随后使用总数据集测试最终模型。此外,我们使用随机选择的1,031个图像块比较了AI和细胞病理学家的表现,并在参考AI结果后重新评估了细胞病理学家的表现.
结果:来自306个甲状腺FNA的总共10,332个图像块,来自86个机构的78个恶性肿瘤(甲状腺乳头状癌)和228个良性患者用于AI培训。InceptionResNetv2实现了99.7%的最高准确率,97.7%,94.9%用于培训,验证,和测试数据集,分别(灵敏度99.9%,99.6%,100%和特异性99.6%,96.4%,90.4%用于培训,验证,和测试数据集,分别)。在AI和人类之间的比较中,AI模型显示出比平均专家细胞病理学家更高的准确性和特异性,超过两个标准偏差(准确性99.71%(95%CI,99.38-100.00%)与88.91%(95%CI,86.99-90.83%),灵敏度99.81%(95%CI,99.54-100.00%)与87.26%(95%CI,85.22-89.30%),和特异性99.61%(95%CI,99.23-99.99%)与90.58%(95%CI,88.80-92.36%)。此外,在参考了人工智能结果之后,专家的所有表现都提高了(准确度96%,95%,96%,分别)以及诊断协议(从0.64到0.84)。
结论:这些结果表明,将AI技术应用于甲状腺FNA细胞学可以提高诊断的准确性以及病理学家之间观察者内部和观察者之间的差异。需要进一步的验证性研究。
Background: Artificial intelligence (AI) is increasingly being applied in pathology and cytology, showing promising results. We collected a large dataset of whole slide images (WSIs) of thyroid fine-needle aspiration cytology (FNA), incorporating z-stacking, from institutions across the nation to develop an AI model. Methods: We conducted a multicenter retrospective diagnostic accuracy study using thyroid FNA dataset from the Open AI Dataset Project that consists of digitalized images samples collected from 3 university hospitals and 215 Korean institutions through extensive quality check during the case selection, scanning, labeling, and reviewing process. Multiple z-layer images were captured using three different scanners and image patches were extracted from WSIs and resized after focus fusion and color normalization. We pretested six AI models, determining Inception ResNet v2 as the best model using a subset of dataset, and subsequently tested the final model with total datasets. Additionally, we compared the performance of AI and cytopathologists using randomly selected 1031 image patches and reevaluated the cytopathologists\' performance after reference to AI results. Results: A total of 10,332 image patches from 306 thyroid FNAs, comprising 78 malignant (papillary thyroid carcinoma) and 228 benign from 86 institutions were used for the AI training. Inception ResNet v2 achieved highest accuracy of 99.7%, 97.7%, and 94.9% for training, validation, and test dataset, respectively (sensitivity 99.9%, 99.6%, and 100% and specificity 99.6%, 96.4%, and 90.4% for training, validation, and test dataset, respectively). In the comparison between AI and human, AI model showed higher accuracy and specificity than the average expert cytopathologists beyond the two-standard deviation (accuracy 99.71% [95% confidence interval (CI), 99.38-100.00%] vs. 88.91% [95% CI, 86.99-90.83%], sensitivity 99.81% [95% CI, 99.54-100.00%] vs. 87.26% [95% CI, 85.22-89.30%], and specificity 99.61% [95% CI, 99.23-99.99%] vs. 90.58% [95% CI, 88.80-92.36%]). Moreover, after referring to the AI results, the performance of all the experts (accuracy 96%, 95%, and 96%, respectively) and the diagnostic agreement (from 0.64 to 0.84) increased. Conclusions: These results suggest that the application of AI technology to thyroid FNA cytology may improve the diagnostic accuracy as well as intra- and inter-observer variability among pathologists. Further confirmatory research is needed.