关键词: COVID-19 CT scans algorithm clinical data clinical features development imaging imbalanced data machine learning oversampling severity assessment validation

来  源:   DOI:10.2196/24572   PDF(Pubmed)

Abstract:
BACKGROUND: COVID-19 has overwhelmed health systems worldwide. It is important to identify severe cases as early as possible, such that resources can be mobilized and treatment can be escalated.
OBJECTIVE: This study aims to develop a machine learning approach for automated severity assessment of COVID-19 based on clinical and imaging data.
METHODS: Clinical data-including demographics, signs, symptoms, comorbidities, and blood test results-and chest computed tomography scans of 346 patients from 2 hospitals in the Hubei Province, China, were used to develop machine learning models for automated severity assessment in diagnosed COVID-19 cases. We compared the predictive power of the clinical and imaging data from multiple machine learning models and further explored the use of four oversampling methods to address the imbalanced classification issue. Features with the highest predictive power were identified using the Shapley Additive Explanations framework.
RESULTS: Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with those reported previously. Although oversampling yielded mixed results, it achieved the best model performance in our study. Logistic regression models differentiating between mild and severe cases achieved the best performance for clinical features (area under the curve [AUC] 0.848; sensitivity 0.455; specificity 0.906), imaging features (AUC 0.926; sensitivity 0.818; specificity 0.901), and a combination of clinical and imaging features (AUC 0.950; sensitivity 0.764; specificity 0.919). The synthetic minority oversampling method further improved the performance of the model using combined features (AUC 0.960; sensitivity 0.845; specificity 0.929).
CONCLUSIONS: Clinical and imaging features can be used for automated severity assessment of COVID-19 and can potentially help triage patients with COVID-19 and prioritize care delivery to those at a higher risk of severe disease.
摘要:
背景:COVID-19使全世界的卫生系统不堪重负。尽早发现重症病例很重要,这样就可以调动资源,提高治疗水平。
目的:本研究旨在开发一种基于临床和影像学数据的自动评估COVID-19严重程度的机器学习方法。
方法:临床数据-包括人口统计学,标志,症状,合并症,以及来自湖北省两家医院的346名患者的血液检查结果和胸部计算机断层扫描扫描,中国,用于开发机器学习模型,以自动评估确诊的COVID-19病例的严重程度。我们比较了来自多个机器学习模型的临床和成像数据的预测能力,并进一步探索了使用四种过采样方法来解决不平衡分类问题。使用Shapley加法解释框架确定具有最高预测能力的特征。
结果:成像特征对模型输出的影响最大,而结合临床和影像学特征的总体表现最佳。确定的预测特征与以前报道的一致。尽管过采样产生的结果好坏参半,它在我们的研究中取得了最好的模型性能。区分轻度和重度病例的Logistic回归模型在临床特征方面取得了最佳表现(曲线下面积[AUC]0.848;敏感性0.455;特异性0.906),影像学特征(AUC0.926;灵敏度0.818;特异性0.901),以及临床和影像学特征的组合(AUC0.950;敏感性0.764;特异性0.919)。合成少数过采样方法使用组合特征(AUC0.960;灵敏度0.845;特异性0.929)进一步提高了模型的性能。
结论:临床和影像学特征可用于COVID-19的自动严重程度评估,并有可能帮助对COVID-19患者进行分诊,并优先为那些严重疾病风险较高的患者提供护理。
公众号