背景:本系统综述(SR)的目的是收集有关使用机器学习(ML)模型诊断颌骨骨内病变的证据,并分析其可靠性,影响,以及这些模型的有用性。该SR根据PRISMA2022指南进行,并在PROSPERO数据库(CRD42022379298)中注册。
方法:使用首字母缩写PICOS来构造以查询为重点的综述问题“人工智能对于颌骨骨内病变的诊断是否可靠?”在各种电子数据库中进行了文献检索,包括PubMed,Embase,Scopus,科克伦图书馆,WebofScience,丁香花,IEEEXplore,和灰色文学(谷歌学者和ProQuest)。使用PROBAST进行偏倚风险评估,并考虑数据集的任务和采样策略对结果进行了综合。
结果:纳入了26项研究(21146张射线照相图像)。成釉细胞瘤,牙源性角化囊肿,牙质囊肿,根尖周囊肿是最常见的病变。根据TRIPOD,大多数研究被分类为2型(随机分组).F1评分仅在13项研究中提出,提供了20次试验的指标,平均值为0.71(±0.25)。
结论:没有确凿的证据支持基于ML的模型在检测中的有用性,分割,颌骨骨内病变的分类和临床常规应用。缺乏关于数据抽样的细节,缺乏一套全面的培训和验证指标,以及缺乏外部测试极限实验,阻碍了对模型性能的正确评估。
BACKGROUND: The purpose of this systematic review (SR) is to gather evidence on the use of machine learning (ML) models in the diagnosis of intraosseous lesions in gnathic bones and to analyze the reliability, impact, and usefulness of such models. This SR was performed in accordance with the PRISMA 2022 guidelines and was registered in the PROSPERO database (CRD42022379298).
METHODS: The acronym PICOS was used to structure the inquiry-focused review question \"Is Artificial Intelligence reliable for the diagnosis of intraosseous lesions in gnathic bones?\" The literature search was conducted in various electronic databases, including PubMed, Embase, Scopus, Cochrane Library, Web of Science, Lilacs, IEEE Xplore, and Gray Literature (Google Scholar and ProQuest). Risk of bias assessment was performed using PROBAST, and the results were synthesized by considering the task and sampling strategy of the dataset.
RESULTS: Twenty-six studies were included (21 146 radiographic images). Ameloblastomas, odontogenic keratocysts, dentigerous cysts, and periapical cysts were the most frequently investigated lesions. According to TRIPOD, most studies were classified as type 2 (randomly divided). The F1 score was presented in only 13 studies, which provided the metrics for 20 trials, with a mean of 0.71 (±0.25).
CONCLUSIONS: There is no conclusive evidence to support the usefulness of ML-based models in the detection, segmentation, and classification of intraosseous lesions in gnathic bones for routine clinical application. The lack of detail about data sampling, the lack of a comprehensive set of metrics for training and validation, and the absence of external testing limit experiments and hinder proper evaluation of model performance.