基于机器学习的脑静脉血栓诊断与临床数据 [J].Machine Learning-based Cerebral Venous Thrombosis Diagnosis with Clinical Data.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

OBJECTIVE: Cerebral Venous Thrombosis (CVT) poses diagnostic challenges due to the variability in disease course and symptoms. The prognosis of CVT relies on early diagnosis. Our study focuses on developing a machine learning-based screening algorithm using clinical data from a large neurology referral center in southern Iran.
METHODS: The Iran Cerebral Venous Thrombosis Registry (ICVTR code: 9001013381) provided data on 382 CVT cases from Namazi Hospital. The control group comprised of adult headache patients without CVT as confirmed by neuroimaging and was retrospectively selected from those admitted to the same hospital. We collected 60 clinical and demographic features for model development and validation. Our modeling pipeline involved imputing missing values and evaluating four machine learning algorithms: generalized linear model, random forest, support vector machine, and extreme gradient boosting.
RESULTS: A total of 314 CVT cases and 575 controls were included. The highest AUROC was reached when imputation was used to estimate missing values for all the variables, combined with the support vector machine model (AUROC=0.910, Recall=0.73, Precision=0.88). The best recall was achieved also by the support vector machine model when only variables with less than 50% missing rate were included (AUROC=0.887, Recall=0.77, Precision=0.86). The random forest model yielded the best precision by using variables with less than 50% missing rate (AUROC=0.882, Recall=0.61, Precision=0.94).
CONCLUSIONS: The application of machine learning techniques using clinical data showed promising results in accurately diagnosing CVT within our study population. This approach offers a valuable complementary assistive tool or an alternative to resource-intensive imaging methods.

摘要：

目的：由于病程和症状的变异性，脑静脉血栓形成（CVT）对诊断提出了挑战。CVT的预后依赖于早期诊断。我们的研究重点是使用来自伊朗南部大型神经病学转诊中心的临床数据开发基于机器学习的筛查算法。
方法：伊朗脑静脉血栓登记（ICVTR代码：9001013381）提供了来自纳马齐医院的382例CVT病例的数据。对照组包括经神经影像学证实的无CVT的成年头痛患者，并从同一医院收治的患者中回顾性选择。我们收集了60个临床和人口统计学特征用于模型开发和验证。我们的建模流程涉及估算缺失值和评估四种机器学习算法：广义线性模型，随机森林,支持向量机，和极端梯度提升。
结果：共纳入314例CVT病例和575例对照。当使用插补来估计所有变量的缺失值时，达到了最高的AUROC，结合支持向量机模型(AUROC=0.910,Recall=0.73,Precision=0.88)。当仅包括缺失率小于50%的变量时,通过支持向量机模型也实现了最佳召回(AUROC=0.887,召回=0.77,精度=0.86)。通过使用缺失率小于50%的变量(AUROC=0.882,Recall=0.61,Precision=0.94),随机森林模型产生了最佳精度。
结论：使用临床数据的机器学习技术的应用在我们研究人群中准确诊断CVT方面显示出了有希望的结果。这种方法提供了一个有价值的补充辅助工具或替代资源密集型成像方法。