关键词: Named Entity Recognition and Classification incidental pulmonary nodule lung cancer model accuracy nodules

来  源:   DOI:10.1177/08465371241266785

Abstract:
Purpose: This study evaluates the efficacy of a commercial medical Named Entity Recognition (NER) model combined with a post-processing protocol in identifying incidental pulmonary nodules from CT reports. Methods: We analyzed 9165 anonymized CT reports and classified them into 3 categories: no nodules, nodules present, and nodules >6 mm. For each report, a generic medical NER model annotated entities and their relations, which were then filtered through inclusion/exclusion criteria selected to identify pulmonary nodules. Ground truth was established by manual review. To better understand the relationship between model performance and nodule prevalence, a subset of the data was programmatically balanced to equalize the number of reports in each class category. Results: In the unbalanced subset of the data, the model achieved a sensitivity of 97%, specificity of 99%, and accuracy of 99% in detecting pulmonary nodules mentioned in the reports. For nodules >6 mm, sensitivity was 95%, specificity was 100%, and accuracy was 100%. In the balanced subset of the data, sensitivity was 99%, specificity 96%, and accuracy 97% for nodule detection; for larger nodules, sensitivity was 94%, specificity 99%, and accuracy 98%. Conclusions: The NER model demonstrated high sensitivity and specificity in detecting pulmonary nodules reported in CT scans, including those >6 mm which are potentially clinically significant. The results were consistent across both unbalanced and balanced datasets indicating that the model performance is independent of nodule prevalence. Implementing this technology in hospital systems could automate the identification of at-risk patients, ensuring timely follow-up and potentially reducing missed or late-stage cancer diagnoses.
摘要:
目的:本研究评估了商业医学命名实体识别(NER)模型与后处理协议相结合在从CT报告中识别偶然的肺结节中的功效。方法:我们分析了9165份匿名CT报告,并将其分为3类:无结节,结节存在,结节>6毫米。对于每个报告,通用医学NER模型注释实体及其关系,然后通过选择的纳入/排除标准进行过滤,以识别肺结节。事实是通过人工审查确定的。为了更好地了解模型性能与结节患病率之间的关系,数据的一个子集以编程方式平衡,以使每个类别中的报告数量相等。结果:在数据的不平衡子集中,该模型实现了97%的灵敏度,99%的特异性,检测报告中提到的肺结节的准确率为99%。对于>6毫米的结节,灵敏度为95%,特异性为100%,准确度为100%。在数据的平衡子集中,灵敏度为99%,特异性96%,结节检测的准确率为97%;对于较大的结节,灵敏度为94%,特异性99%,准确率98%。结论:NER模型在检测CT扫描中报告的肺结节方面表现出很高的敏感性和特异性,包括那些>6毫米有潜在临床意义的。不平衡和平衡数据集的结果一致,表明模型性能与结节患病率无关。在医院系统中实施这项技术可以自动识别有风险的患者,确保及时随访,并有可能减少漏诊或晚期癌症诊断。
公众号