{Reference Type}: Systematic Review {Title}: From Bench-to-Bedside: How Artificial Intelligence is Changing Thyroid Nodule Diagnostics, a Systematic Review. {Author}: Sant VR;Radhachandran A;Ivezic V;Lee DT;Livhits MJ;Wu JX;Masamed R;Arnold CW;Yeh MW;Speier W; {Journal}: J Clin Endocrinol Metab {Volume}: 109 {Issue}: 7 {Year}: 2024 Jun 17 {Factor}: 6.134 {DOI}: 10.1210/clinem/dgae277 {Abstract}: BACKGROUND: Use of artificial intelligence (AI) to predict clinical outcomes in thyroid nodule diagnostics has grown exponentially over the past decade. The greatest challenge is in understanding the best model to apply to one's own patient population, and how to operationalize such a model in practice.
METHODS: A literature search of PubMed and IEEE Xplore was conducted for English-language publications between January 1, 2015 and January 1, 2023, studying diagnostic tests on suspected thyroid nodules that used AI. We excluded articles without prospective or external validation, nonprimary literature, duplicates, focused on nonnodular thyroid conditions, not using AI, and those incidentally using AI in support of an experimental diagnostic outside standard clinical practice. Quality was graded by Oxford level of evidence.
RESULTS: A total of 61 studies were identified; all performed external validation, 16 studies were prospective, and 33 compared a model to physician prediction of ground truth. Statistical validation was reported in 50 papers. A diagnostic pipeline was abstracted, yielding 5 high-level outcomes: (1) nodule localization, (2) ultrasound (US) risk score, (3) molecular status, (4) malignancy, and (5) long-term prognosis. Seven prospective studies validated a single commercial AI; strengths included automating nodule feature assessment from US and assisting the physician in predicting malignancy risk, while weaknesses included automated margin prediction and interobserver variability.
CONCLUSIONS: Models predominantly used US images to predict malignancy. Of 4 Food and Drug Administration-approved products, only S-Detect was extensively validated. Implementing an AI model locally requires data sanitization and revalidation to ensure appropriate clinical performance.