关键词: Artificial intelligence Diagnosis criteria Thyroid nodule Ultrasound

Mesh : Humans Prospective Studies Artificial Intelligence Thyroid Nodule / diagnostic imaging pathology Female Male Middle Aged Adult Ultrasonography / methods Radiologists Aged Thyroid Gland / diagnostic imaging Sensitivity and Specificity Young Adult Adolescent

来  源:   DOI:10.1186/s12916-024-03510-z   PDF(Pubmed)

Abstract:
BACKGROUND: This study is to propose a clinically applicable 2-echelon (2e) diagnostic criteria for the analysis of thyroid nodules such that low-risk nodules are screened off while only suspicious or indeterminate ones are further examined by histopathology, and to explore whether artificial intelligence (AI) can provide precise assistance for clinical decision-making in the real-world prospective scenario.
METHODS: In this prospective study, we enrolled 1036 patients with a total of 2296 thyroid nodules from three medical centers. The diagnostic performance of the AI system, radiologists with different levels of experience, and AI-assisted radiologists with different levels of experience in diagnosing thyroid nodules were evaluated against our proposed 2e diagnostic criteria, with the first being an arbitration committee consisting of 3 senior specialists and the second being cyto- or histopathology.
RESULTS: According to the 2e diagnostic criteria, 1543 nodules were classified by the arbitration committee, and the benign and malignant nature of 753 nodules was determined by pathological examinations. Taking pathological results as the evaluation standard, the sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (AUC) of the AI systems were 0.826, 0.815, 0.821, and 0.821. For those cases where diagnosis by the Arbitration Committee were taken as the evaluation standard, the sensitivity, specificity, accuracy, and AUC of the AI system were 0.946, 0.966, 0.964, and 0.956. Taking the global 2e diagnostic criteria as the gold standard, the sensitivity, specificity, accuracy, and AUC of the AI system were 0.868, 0.934, 0.917, and 0.901, respectively. Under different criteria, AI was comparable to the diagnostic performance of senior radiologists and outperformed junior radiologists (all P < 0.05). Furthermore, AI assistance significantly improved the performance of junior radiologists in the diagnosis of thyroid nodules, and their diagnostic performance was comparable to that of senior radiologists when pathological results were taken as the gold standard (all p > 0.05).
CONCLUSIONS: The proposed 2e diagnostic criteria are consistent with real-world clinical evaluations and affirm the applicability of the AI system. Under the 2e criteria, the diagnostic performance of the AI system is comparable to that of senior radiologists and significantly improves the diagnostic capabilities of junior radiologists. This has the potential to reduce unnecessary invasive diagnostic procedures in real-world clinical practice.
摘要:
背景:这项研究旨在提出一种临床适用的2级(2e)诊断标准,用于分析甲状腺结节,以便筛查低危结节,而只对可疑或不确定的结节进行进一步检查组织病理学,并探索人工智能(AI)是否可以在现实世界的前瞻性场景中为临床决策提供精确的帮助。
方法:在这项前瞻性研究中,我们纳入了来自三个医疗中心的1036例患者,共2296个甲状腺结节.AI系统的诊断性能,具有不同经验水平的放射科医生,根据我们提出的2e诊断标准,对具有不同经验的AI辅助放射科医师进行了评估,第一个是由3名高级专家组成的仲裁委员会,第二个是细胞或组织病理学。
结果:根据2e诊断标准,仲裁委员会对1543个结核进行了分类,病理检查确定753个结节的良恶性。以病理结果为评价标准,灵敏度,特异性,准确度,AI系统的受试者工作特征曲线下面积(AUC)分别为0.826、0.815、0.821和0.821。对于以仲裁委员会诊断为评价标准的案件,灵敏度,特异性,准确度,AI系统的AUC分别为0.946、0.966、0.964和0.956。以全球2e诊断标准为黄金标准,灵敏度,特异性,准确度,AI系统的AUC分别为0.868、0.934、0.917和0.901。在不同的标准下,AI与高级放射科医师的诊断表现相当,优于初级放射科医师(均P<0.05)。此外,AI辅助显着提高了初级放射科医师在甲状腺结节诊断中的表现,以病理结果为金标准时,他们的诊断能力与资深放射科医生相当(均p>0.05)。
结论:提出的2e诊断标准与现实世界的临床评估一致,并肯定了AI系统的适用性。在2e标准下,AI系统的诊断性能与高级放射科医生相当,并显着提高了初级放射科医生的诊断能力。这有可能减少现实世界临床实践中不必要的侵入性诊断程序。
公众号