关键词: Classification algorithms Healthcare environments Imbalanced dataset Machine learning Missed appointments Resampling techniques

Mesh : Humans Algorithms Benchmarking Brazil Machine Learning Decision Support Techniques

来  源:   DOI:10.1186/s12913-023-10418-6   PDF(Pubmed)

Abstract:
BACKGROUND: No-show to medical appointments has significant adverse effects on healthcare systems and their clients. Using machine learning to predict no-shows allows managers to implement strategies such as overbooking and reminders targeting patients most likely to miss appointments, optimizing the use of resources.
METHODS: In this study, we proposed a detailed analytical framework for predicting no-shows while addressing imbalanced datasets. The framework includes a novel use of z-fold cross-validation performed twice during the modeling process to improve model robustness and generalization. We also introduce Symbolic Regression (SR) as a classification algorithm and Instance Hardness Threshold (IHT) as a resampling technique and compared their performance with that of other classification algorithms, such as K-Nearest Neighbors (KNN) and Support Vector Machine (SVM), and resampling techniques, such as Random under Sampling (RUS), Synthetic Minority Oversampling Technique (SMOTE) and NearMiss-1. We validated the framework using two attendance datasets from Brazilian hospitals with no-show rates of 6.65% and 19.03%.
RESULTS: From the academic perspective, our study is the first to propose using SR and IHT to predict the no-show of patients. Our findings indicate that SR and IHT presented superior performances compared to other techniques, particularly IHT, which excelled when combined with all classification algorithms and led to low variability in performance metrics results. Our results also outperformed sensitivity outcomes reported in the literature, with values above 0.94 for both datasets.
CONCLUSIONS: This is the first study to use SR and IHT methods to predict patient no-shows and the first to propose performing z-fold cross-validation twice. Our study highlights the importance of avoiding relying on few validation runs for imbalanced datasets as it may lead to biased results and inadequate analysis of the generalization and stability of the models obtained during the training stage.
摘要:
背景:不参加医疗预约会对医疗保健系统及其客户产生重大不利影响。使用机器学习来预测不出现允许管理者实施策略,例如针对最有可能错过预约的患者的超额预订和提醒。优化资源使用。
方法:在本研究中,我们提出了一个详细的分析框架,用于在解决不平衡数据集的同时预测不显示。该框架包括在建模过程中执行两次的z-fold交叉验证的新颖使用,以提高模型的鲁棒性和泛化性。我们还引入了符号回归(SR)作为分类算法和实例硬度阈值(IHT)作为重采样技术,并将其性能与其他分类算法进行了比较。如K近邻(KNN)和支持向量机(SVM),和重采样技术,例如随机抽样(RUS),合成少数过采样技术(SMOTE)和NearMiss-1。我们使用来自巴西医院的两个就诊数据集验证了该框架,未显示率为6.65%和19.03%。
结果:从学术角度来看,我们的研究首次提出使用SR和IHT来预测患者的未出现.我们的发现表明,与其他技术相比,SR和IHT表现出优异的性能,特别是IHT,与所有分类算法结合使用时表现优异,并导致性能指标结果的低可变性。我们的结果也优于文献中报道的敏感性结果,两个数据集的值都高于0.94。
结论:这是第一个使用SR和IHT方法预测患者未出现的研究,并且是第一个提出进行两次z折交叉验证的研究。我们的研究强调了避免对不平衡数据集进行少量验证运行的重要性,因为这可能会导致有偏见的结果以及对训练阶段获得的模型的泛化和稳定性的分析不足。
公众号