%0 Journal Article
%T Diagnostic test accuracy of externally validated convolutional neural network (CNN) artificial intelligence (AI) models for emergency head CT scans - A systematic review.
%A Mäenpää SM
%A Korja M
%J Int J Med Inform
%V 189
%N 0
%D 2024 Sep 13
%M 38901270
%F 4.73
%R 10.1016/j.ijmedinf.2024.105523
%X <B>BACKGROUND: </B>The surge in emergency head CT imaging and artificial intelligence (AI) advancements, especially deep learning (DL) and convolutional neural networks (CNN), have accelerated the development of computer-aided diagnosis (CADx) for emergency imaging. External validation assesses model generalizability, providing preliminary evidence of clinical potential.<BR><B>OBJECTIVE: </B>This study systematically reviews externally validated CNN-CADx models for emergency head CT scans, critically appraises diagnostic test accuracy (DTA), and assesses adherence to reporting guidelines.<BR><B>METHODS: </B>Studies comparing CNN-CADx model performance to reference standard were eligible. The review was registered in PROSPERO (CRD42023411641) and conducted on Medline, Embase, EBM-Reviews and Web of Science following PRISMA-DTA guideline. DTA reporting were systematically extracted and appraised using standardised checklists (STARD, CHARMS, CLAIM, TRIPOD, PROBAST, QUADAS-2).<BR><B>RESULTS: </B>Six of 5636 identified studies were eligible. The common target condition was intracranial haemorrhage (ICH), and intended workflow roles auxiliary to experts. Due to methodological and clinical between-study variation, meta-analysis was inappropriate. The scan-level sensitivity exceeded 90 % in 5/6 studies, while specificities ranged from 58,0-97,7 %. The SROC 95 % predictive region was markedly broader than the confidence region, ranging above 50 % sensitivity and 20 % specificity. All studies had unclear or high risk of bias and concern for applicability (QUADAS-2, PROBAST), and reporting adherence was below 50 % in 20 of 32 TRIPOD items.<BR><B>CONCLUSIONS: </B>0.01 % of identified studies met the eligibility criteria. The evidence on the DTA of CNN-CADx models for emergency head CT scans remains limited in the scope of this review, as the reviewed studies were scarce, inapt for meta-analysis and undermined by inadequate methodological conduct and reporting. Properly conducted, external validation remains preliminary for evaluating the clinical potential of AI-CADx models, but prospective and pragmatic clinical validation in comparative trials remains most crucial. In conclusion, future AI-CADx research processes should be methodologically standardized and reported in a clinically meaningful way to avoid research waste.