关键词: Cervical cancer Supervised machine learning Women living with HIV

Mesh : Humans Female Uganda / epidemiology Uterine Cervical Neoplasms / diagnosis HIV Infections / drug therapy Adult Supervised Machine Learning Middle Aged Precancerous Conditions / diagnosis Logistic Models Algorithms Support Vector Machine

来  源:   DOI:10.1186/s12905-024-03232-7   PDF(Pubmed)

Abstract:
BACKGROUND: Cervical cancer (CC) is among the most prevalent cancer types among women with the highest prevalence in low- and middle-income countries (LMICs). It is a curable disease if detected early. Machine learning (ML) techniques can aid in early detection and prediction thus reducing screening and treatment costs. This study focused on women living with HIV (WLHIV) in Uganda. Its aim was to identify the best predictors of CC and the supervised ML model that best predicts CC among WLHIV.
METHODS: Secondary data that included 3025 women from three health facilities in central Uganda was used. A multivariate binary logistic regression and recursive feature elimination with random forest (RFERF) were used to identify the best predictors. Five models; logistic regression (LR), random forest (RF), K-Nearest neighbor (KNN), support vector machine (SVM), and multi-layer perceptron (MLP) were applied to identify the out-performer. The confusion matrix and the area under the receiver operating characteristic curve (AUC/ROC) were used to evaluate the models.
RESULTS: The results revealed that duration on antiretroviral therapy (ART), WHO clinical stage, TPT status, Viral load status, and family planning were commonly selected by the two techniques and thus highly significant in CC prediction. The RF from the RFERF-selected features outperformed other models with the highest scores of 90% accuracy and 0.901 AUC.
CONCLUSIONS: Early identification of CC and knowledge of the risk factors could help control the disease. The RF outperformed other models applied regardless of the selection technique used. Future research can be expanded to include ART-naïve women in predicting CC.
摘要:
背景:宫颈癌(CC)是低收入和中等收入国家(LMICs)患病率最高的女性中最常见的癌症类型之一。如果早期发现,它是一种可治愈的疾病。机器学习(ML)技术可以帮助早期检测和预测,从而降低筛查和治疗成本。这项研究的重点是乌干达感染艾滋病毒(WLHIV)的妇女。其目的是确定CC的最佳预测因子以及在WLHIV中最佳预测CC的监督ML模型。
方法:次要数据包括来自乌干达中部三个医疗机构的3025名妇女。使用多元二元逻辑回归和带有随机森林的递归特征消除(RFERF)来确定最佳预测因子。五个模型;逻辑回归(LR),随机森林(RF),K-最近邻居(KNN),支持向量机(SVM),和多层感知器(MLP)被用来识别表现不佳的人。混淆矩阵和受试者工作特征曲线下面积(AUC/ROC)用于评估模型。
结果:结果显示抗逆转录病毒疗法(ART)的持续时间,WHO临床分期,TPT状态,病毒载量状态,和计划生育通常是通过两种技术选择的,因此在CC预测中非常重要。来自RFERF选择的特征的RF优于其他模型,具有90%准确度和0.901AUC的最高得分。
结论:早期识别CC并了解其危险因素有助于控制疾病。无论使用何种选择技术,RF都优于其他应用的模型。未来的研究可以扩展到包括抗逆转录病毒疗法的女性来预测CC。
公众号