用于紧急头部 CT 扫描的外部验证的卷积神经网络（ CNN ）人工智能（ AI ）模型的诊断测试准确性 - 系统评价。Diagnostic test accuracy of externally validated convolutional neural network (CNN) artificial intelligence (AI) models for emergency head CT scans - A systematic review.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

BACKGROUND: The surge in emergency head CT imaging and artificial intelligence (AI) advancements, especially deep learning (DL) and convolutional neural networks (CNN), have accelerated the development of computer-aided diagnosis (CADx) for emergency imaging. External validation assesses model generalizability, providing preliminary evidence of clinical potential.
OBJECTIVE: This study systematically reviews externally validated CNN-CADx models for emergency head CT scans, critically appraises diagnostic test accuracy (DTA), and assesses adherence to reporting guidelines.
METHODS: Studies comparing CNN-CADx model performance to reference standard were eligible. The review was registered in PROSPERO (CRD42023411641) and conducted on Medline, Embase, EBM-Reviews and Web of Science following PRISMA-DTA guideline. DTA reporting were systematically extracted and appraised using standardised checklists (STARD, CHARMS, CLAIM, TRIPOD, PROBAST, QUADAS-2).
RESULTS: Six of 5636 identified studies were eligible. The common target condition was intracranial haemorrhage (ICH), and intended workflow roles auxiliary to experts. Due to methodological and clinical between-study variation, meta-analysis was inappropriate. The scan-level sensitivity exceeded 90 % in 5/6 studies, while specificities ranged from 58,0-97,7 %. The SROC 95 % predictive region was markedly broader than the confidence region, ranging above 50 % sensitivity and 20 % specificity. All studies had unclear or high risk of bias and concern for applicability (QUADAS-2, PROBAST), and reporting adherence was below 50 % in 20 of 32 TRIPOD items.
CONCLUSIONS: 0.01 % of identified studies met the eligibility criteria. The evidence on the DTA of CNN-CADx models for emergency head CT scans remains limited in the scope of this review, as the reviewed studies were scarce, inapt for meta-analysis and undermined by inadequate methodological conduct and reporting. Properly conducted, external validation remains preliminary for evaluating the clinical potential of AI-CADx models, but prospective and pragmatic clinical validation in comparative trials remains most crucial. In conclusion, future AI-CADx research processes should be methodologically standardized and reported in a clinically meaningful way to avoid research waste.

摘要：

背景：紧急头部CT成像和人工智能（AI）进步的激增，特别是深度学习(DL)和卷积神经网络(CNN)，加速了用于紧急成像的计算机辅助诊断（CADx）的发展。外部验证评估模型的可泛化性，提供临床潜力的初步证据。
目的：本研究系统地回顾了用于急诊头部CT扫描的外部验证的CNN-CADx模型，严格评估诊断测试准确性(DTA)，并评估对报告指南的遵守情况。
方法：将CNN-CADx模型性能与参考标准进行比较的研究合格。该审查已在PROSPERO（CRD42023411641）中注册，并在Medline上进行。Embase,EBM评论和WebofScience遵循PRISMA-DTA指南。DTA报告是使用标准化清单系统地提取和评估的(STARD，CHARMS,CLAIM,TRIPOD,PROBAST,QUADAS-2).
结果：5636项确定的研究中有6项符合条件。常见的目标条件是颅内出血（ICH），和辅助专家的预期工作流角色。由于方法学和临床研究之间的差异，荟萃分析是不合适的。在5/6研究中，扫描水平灵敏度超过90%，而特异性范围为58,0-97,7%。SROC95%预测区域明显比置信区域宽，灵敏度超过50%，特异性超过20%。所有研究都有不明确或高风险的偏倚和对适用性的关注（QUADAS-2，PROBAST），在32个TRIPOD项目中，有20个报告的依从性低于50%。
结论：0.01%的研究符合资格标准。CNN-CADx模型用于紧急头部CT扫描的DTA证据在本综述范围内仍然有限，由于审查的研究很少，不适合进行荟萃分析，并因方法学行为和报告不足而受到损害。进行得当,外部验证对于评估AI-CADx模型的临床潜力仍然是初步的，但比较试验中的前瞻性和实用性临床验证仍然是最关键的.总之,未来的AI-CADx研究过程应该在方法学上标准化，并以有临床意义的方式报告，以避免研究浪费。