Imbalanced data

不平衡数据
  • 文章类型: Journal Article
    目的:脓毒症的预测,尤其是早期诊断,在生物医学研究中受到了极大的关注。为了改善现有的医疗评分系统,克服当地EHR(电子健康记录)的班级不平衡和样本量的限制,我们提出了一种新的基于知识转移的方法,它结合了医学评分系统和有序逻辑回归模型。
    方法:医疗评分系统(即新闻,SIRS和QSOFA)通常是稳健的,可用于败血症诊断。有了当地的EHR,基于机器学习的方法已被广泛用于构建预测模型/方法,但它们经常受到班级不平衡和样本量的影响。最近提出了知识蒸馏和知识转移作为一种组合方法,用于提高预测性能和模型泛化。在这项研究中,我们开发了一种新的基于知识转移的方法,用于结合医学评分系统(在提出的分数转换之后)和序数逻辑回归模型.我们在数学上证实了它等同于加权回归的特定形式。此外,我们从理论上探讨了它在阶级不平衡情况下的有效性。
    结果:对于本地数据集和MIMIC-IV数据集,VUS(多维ROC表面下的体积,基于NEWS评分系统的基于知识转移的模型(ORNEWS)的序数类别的AUC-ROC的概括度量分别为0.384和0.339,而传统序数回归模型(OR)的VUS分别为0.352和0.322。在顺序场景中,基于SIRS/QSOFA评分系统的基于知识转移的模型也观察到了一致的分析结果。此外,基于知识转移的模型的预测概率和二元分类ROC曲线表明,这种方法增强了少数类的预测概率,同时降低了多数类的预测概率。这改进了不平衡数据上的AUC/VUS。
    结论:知识转移,结合了医疗评分系统和基于机器学习的模型,提高了脓毒症早期诊断的预测性能,特别是在班级不平衡和样本量有限的情况下。
    OBJECTIVE: The prediction of sepsis, especially early diagnosis, has received a significant attention in biomedical research. In order to improve current medical scoring system and overcome the limitations of class imbalance and sample size of local EHR (electronic health records), we propose a novel knowledge-transfer-based approach, which combines a medical scoring system and an ordinal logistic regression model.
    METHODS: Medical scoring systems (i.e. NEWS, SIRS and QSOFA) are generally robust and useful for sepsis diagnosis. With local EHR, machine-learning-based methods have been widely used for building prediction models/methods, but they are often impacted by class imbalance and sample size. Knowledge distillation and knowledge transfer have recently been proposed as a combination approach for improving the prediction performance and model generalization. In this study, we developed a novel knowledge-transfer-based method for combining a medical scoring system (after a proposed score transformation) and an ordinal logistic regression model. We mathematically confirmed that it was equivalent to a specific form of the weighted regression. Furthermore, we theoretically explored its effectiveness in the scenario of class imbalance.
    RESULTS: For the local dataset and the MIMIC-IV dataset, the VUS (the volume under the multi-dimensional ROC surface, a generalization measure of AUC-ROC for ordinal categories) of the knowledge-transfer-based model (ORNEWS) based on the NEWS scoring system were 0.384 and 0.339, respectively, while the VUS of the traditional ordinal regression model (OR) were 0.352 and 0.322, respectively. Consistent analysis results were also observed for the knowledge-transfer-based models based on the SIRS/QSOFA scoring systems in the ordinal scenarios. Additionally, the predicted probabilities and the binary classification ROC curves of the knowledge-transfer-based models indicated that this approach enhanced the predicted probabilities for the minority classes while reducing the predicted probabilities for the majority classes, which improved AUCs/VUSs on imbalanced data.
    CONCLUSIONS: Knowledge transfer, which combines a medical scoring system and a machine-learning-based model, improves the prediction performance for early diagnosis of sepsis, especially in the scenarios of class imbalance and limited sample size.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:数据不平衡是医学数据挖掘中普遍存在的问题,往往导致有偏见和不可靠的预测模型。这项研究旨在解决迫切需要有效的策略,以减轻数据失衡对分类模型的影响。我们专注于量化不同不平衡程度和样本大小对模型性能的影响,确定最佳截止值,并评估各种方法在高度不平衡和小样本情况下增强模型准确性的有效性。
    方法:我们收集了在生殖医学中心接受辅助生殖治疗的患者的医疗记录。随机森林用于筛选预测目标的关键变量。构建具有不同不平衡程度和样本量的各种数据集以比较逻辑回归模型的分类性能。指标,如AUC,G-mean,F1-Score,准确性,回想一下,和精度用于评估。四种失衡治疗方法(SMOTE,Adasyn,OSS,和CNN)被应用于低阳性率和小样本量的数据集以评估其有效性。
    结果:当阳性率低于10%但稳定超过该阈值时,逻辑模型的性能较低。同样,样本量低于1200,结果不佳,在这个门槛之上看到了改进。为了鲁棒性,阳性率和样本量的最佳截止值分别为15%和1500.SMOTE和ADASYN过采样显著提高了低阳性率和小样本量的数据集的分类性能。
    结论:该研究确定了15%的阳性率和1500的样本量作为稳定逻辑模型性能的最佳截止值。对于低阳性率和小样本量的数据集,建议使用SMOTE和ADASYN来提高平衡性和模型准确性。
    OBJECTIVE: Data imbalance is a pervasive issue in medical data mining, often leading to biased and unreliable predictive models. This study aims to address the urgent need for effective strategies to mitigate the impact of data imbalance on classification models. We focus on quantifying the effects of different imbalance degrees and sample sizes on model performance, identifying optimal cut-off values, and evaluating the efficacy of various methods to enhance model accuracy in highly imbalanced and small sample size scenarios.
    METHODS: We collected medical records of patients receiving assisted reproductive treatment in a reproductive medicine center. Random forest was used to screen the key variables for the prediction target. Various datasets with different imbalance degrees and sample sizes were constructed to compare the classification performance of logistic regression models. Metrics such as AUC, G-mean, F1-Score, Accuracy, Recall, and Precision were used for evaluation. Four imbalance treatment methods (SMOTE, ADASYN, OSS, and CNN) were applied to datasets with low positive rates and small sample sizes to assess their effectiveness.
    RESULTS: The logistic model\'s performance was low when the positive rate was below 10% but stabilized beyond this threshold. Similarly, sample sizes below 1200 yielded poor results, with improvement seen above this threshold. For robustness, the optimal cut-offs for positive rate and sample size were identified as 15% and 1500, respectively. SMOTE and ADASYN oversampling significantly improved classification performance in datasets with low positive rates and small sample sizes.
    CONCLUSIONS: The study identifies a positive rate of 15% and a sample size of 1500 as optimal cut-offs for stable logistic model performance. For datasets with low positive rates and small sample sizes, SMOTE and ADASYN are recommended to improve balance and model accuracy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:医疗保健中的卷积神经网络(CNN)系统受到不平衡的数据集和不同大小的影响。本文深入研究了数据集大小的影响,阶级不平衡,以及它们在CNN系统上的相互作用,专注于训练集的大小与不平衡-与流行文献相比,这是一个独特的视角。此外,它解决了具有两个以上分类组的场景,经常被忽视,但在实际环境中普遍存在。
    方法:最初,CNN被开发出来使用X射线图像对肺部疾病进行分类,区分健康个体和COVID-19患者。稍后,该模型已扩展至肺炎患者.为了评估性能,对于二元和三元分类,进行了大量的实验,使用不同的数据大小和不平衡率,测量各种指标以验证模型的有效性。
    结果:研究表明,增加数据集大小对CNN性能有积极影响,但是这种改进超过了一定的规模。一个新的发现是,数据平衡比率比数据集大小对性能的影响更大。三类分类的行为反映了二元分类的行为,强调平衡数据集对准确分类的重要性。
    结论:这项研究强调了这样一个事实,即在数据集中实现平衡表示对于医疗保健中最佳的CNN性能至关重要。挑战对数据集大小的传统关注。平衡的数据集提高了分类准确性,在两类和三类情况下,强调需要数据平衡技术来提高模型的可靠性和有效性。
    背景:我们的研究是由100个患者样本的情景推动的,提供两种选择:200个样本的平衡数据集和500个样本的不平衡数据集(400个健康个体)。我们的目标是提供基于数据集大小和不平衡之间相互作用的最佳选择的见解,丰富了对实现最优模型绩效感兴趣的利益相关者的话语。
    结论:认识到单个模型的泛化限制,我们断言,需要对不同数据集进行进一步研究。
    BACKGROUND: Convolutional Neural Network (CNN) systems in healthcare are influenced by unbalanced datasets and varying sizes. This article delves into the impact of dataset size, class imbalance, and their interplay on CNN systems, focusing on the size of the training set versus imbalance-a unique perspective compared to the prevailing literature. Furthermore, it addresses scenarios with more than two classification groups, often overlooked but prevalent in practical settings.
    METHODS: Initially, a CNN was developed to classify lung diseases using X-ray images, distinguishing between healthy individuals and COVID-19 patients. Later, the model was expanded to include pneumonia patients. To evaluate performance, numerous experiments were conducted with varied data sizes and imbalance ratios for both binary and ternary classifications, measuring various indices to validate the model\'s efficacy.
    RESULTS: The study revealed that increasing dataset size positively impacts CNN performance, but this improvement saturates beyond a certain size. A novel finding is that the data balance ratio influences performance more significantly than dataset size. The behavior of three-class classification mirrored that of binary classification, underscoring the importance of balanced datasets for accurate classification.
    CONCLUSIONS: This study emphasizes the fact that achieving balanced representation in datasets is crucial for optimal CNN performance in healthcare, challenging the conventional focus on dataset size. Balanced datasets improve classification accuracy, both in two-class and three-class scenarios, highlighting the need for data-balancing techniques to improve model reliability and effectiveness.
    BACKGROUND: Our study is motivated by a scenario with 100 patient samples, offering two options: a balanced dataset with 200 samples and an unbalanced dataset with 500 samples (400 healthy individuals). We aim to provide insights into the optimal choice based on the interplay between dataset size and imbalance, enriching the discourse for stakeholders interested in achieving optimal model performance.
    CONCLUSIONS: Recognizing a single model\'s generalizability limitations, we assert that further studies on diverse datasets are needed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    预测模型在临床决策中具有巨大的潜力,然而,其有效性可能会受到临床数据集固有数据不平衡的阻碍。这项研究调查了合成数据的实用性,以改善现实的小型不平衡临床数据集上的预测建模性能。我们比较了各种合成数据生成方法,包括生成对抗网络,正火流量,和变分自动编码器的标准基线,用于纠正四个临床数据集上的类别不足。尽管结果显示在某些情况下F1得分有所改善,即使多次重复,我们没有获得统计学上显著的证据表明合成数据生成优于标准基线来校正类不平衡.这项研究挑战了关于合成数据用于数据增强的功效的普遍信念,并强调了针对简单基线评估新的复杂方法的重要性。
    Predictive modeling holds a large potential in clinical decision-making, yet its effectiveness can be hindered by inherent data imbalances in clinical datasets. This study investigates the utility of synthetic data for improving the performance of predictive modeling on realistic small imbalanced clinical datasets. We compared various synthetic data generation methods including Generative Adversarial Networks, Normalizing Flows, and Variational Autoencoders to the standard baselines for correcting for class underrepresentation on four clinical datasets. Although results show improvement in F1 scores in some cases, even over multiple repetitions, we do not obtain statistically significant evidence that synthetic data generation outperforms standard baselines for correcting for class imbalance. This study challenges common beliefs about the efficacy of synthetic data for data augmentation and highlights the importance of evaluating new complex methods against simple baselines.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    通过利用从物联网(IoT)和企业资源计划(ERP)系统中的传感器获得的信息,可以提高纺织制造部门的织物质量预测,这些传感器与嵌入在纺织机械中的传感器相连。工业4.0概念的集成有助于利用物联网传感器数据,which,反过来,导致提高生产率和减少纺织制造过程中的交货时间。这项研究解决了纺织制造业中与织物质量有关的不平衡数据问题。它包括对七种开源自动机器学习(AutoML)技术的评估,即FLAML(快速轻量级AutoML),AutoViML(自动构建变体可解释ML模型),EvalML(评估机器学习),AutoGluon,H2OAutoML,PyCaret,和TPOT(基于树的管道优化工具)。通过采用在计算效率和预测准确性之间找到折衷的创新方法,为某些情况选择最合适的解决方案。结果表明,对于预定的目标函数,EvalML成为表现最好的AutoML模型,在平均绝对误差(MAE)方面尤其出色。另一方面,即使推理周期较长,AutoGluon在平均绝对百分比误差(MAPE)、均方根误差(RMSE),和r平方。此外,这项研究探讨了每个AutoML模型提供的特征重要性排名,阐明显著影响预测结果的属性。值得注意的是,发现sin/cos编码在表征具有大量唯一值的分类变量方面特别有效。本研究包括有关AutoML在纺织工业中应用的有用信息,并提供了采用工业4.0技术来增强织物质量预测的路线图。该研究强调了在预测准确性和计算效率之间取得平衡的重要性,强调特征重要性对模型可解释性的重要性,并为该领域的未来调查奠定基础。
    The enhancement of fabric quality prediction in the textile manufacturing sector is achieved by utilizing information derived from sensors within the Internet of Things (IoT) and Enterprise Resource Planning (ERP) systems linked to sensors embedded in textile machinery. The integration of Industry 4.0 concepts is instrumental in harnessing IoT sensor data, which, in turn, leads to improvements in productivity and reduced lead times in textile manufacturing processes. This study addresses the issue of imbalanced data pertaining to fabric quality within the textile manufacturing industry. It encompasses an evaluation of seven open-source automated machine learning (AutoML) technologies, namely FLAML (Fast Lightweight AutoML), AutoViML (Automatically Build Variant Interpretable ML models), EvalML (Evaluation Machine Learning), AutoGluon, H2OAutoML, PyCaret, and TPOT (Tree-based Pipeline Optimization Tool). The most suitable solutions are chosen for certain circumstances by employing an innovative approach that finds a compromise among computational efficiency and forecast accuracy. The results reveal that EvalML emerges as the top-performing AutoML model for a predetermined objective function, particularly excelling in terms of mean absolute error (MAE). On the other hand, even with longer inference periods, AutoGluon performs better than other methods in measures like mean absolute percentage error (MAPE), root mean squared error (RMSE), and r-squared. Additionally, the study explores the feature importance rankings provided by each AutoML model, shedding light on the attributes that significantly influence predictive outcomes. Notably, sin/cos encoding is found to be particularly effective in characterizing categorical variables with a large number of unique values. This study includes useful information about the application of AutoML in the textile industry and provides a roadmap for employing Industry 4.0 technologies to enhance fabric quality prediction. The research highlights the importance of striking a balance between predictive accuracy and computational efficiency, emphasizes the significance of feature importance for model interpretability, and lays the groundwork for future investigations in this field.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    将机器学习技术应用于不平衡的数据集对材料科学提出了重大挑战,因为少数群体的代表性不足的特征通常被大多数群体中大量无关的特征所掩盖。解决此问题的现有方法集中于使用过采样或合成数据生成技术来平衡每个类的计数。然而,这些方法可能导致有价值的信息丢失或过拟合。这里,我们引入了一个深度学习框架来预测少数民族类材料,特别是在金属-绝缘体过渡(MIT)材料领域内。拟议的方法,称为提升CGCNN,将晶体图卷积神经网络(CGCNN)模型与梯度提升算法相结合。该模型通过依次构建更深入的神经网络,有效地处理了MIT材料数据中的极端类失衡。比较评估表明,与其他方法相比,所提出的模型具有出色的性能。我们的方法是处理材料科学中不平衡数据集的有前途的解决方案。
    Applying machine-learning techniques for imbalanced data sets presents a significant challenge in materials science since the underrepresented characteristics of minority classes are often buried by the abundance of unrelated characteristics in majority of classes. Existing approaches to address this focus on balancing the counts of each class using oversampling or synthetic data generation techniques. However, these methods can lead to loss of valuable information or overfitting. Here, we introduce a deep learning framework to predict minority-class materials, specifically within the realm of metal-insulator transition (MIT) materials. The proposed approach, termed boosting-CGCNN, combines the crystal graph convolutional neural network (CGCNN) model with a gradient-boosting algorithm. The model effectively handled extreme class imbalances in MIT material data by sequentially building a deeper neural network. The comparative evaluations demonstrated the superior performance of the proposed model compared to other approaches. Our approach is a promising solution for handling imbalanced data sets in materials science.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:肺栓塞(PE)的识别不足或晚期是严重威胁患者生命的1个或多个肺动脉的血栓形成,是现代医学面临的主要挑战。
    目的:我们旨在建立准确且信息丰富的机器学习(ML)模型,以识别入院时PE高危患者。在他们的初步临床检查之前,只使用他们医疗记录中的信息。
    方法:我们收集了人口统计数据,合并症,2568例PE患者和52,598例对照患者的药物数据。我们专注于急诊科入院前的可用数据,因为这些是最普遍可访问的数据。我们训练了ML随机森林算法,以在患者住院期间的最早时间-入院时检测PE。我们开发并应用了2种基于ML的方法,专门解决PE和非PE患者之间的数据失衡问题。导致PE误诊。
    结果:所得模型根据年龄预测PE,性别,BMI,过去的临床PE事件,慢性肺病,过去的血栓形成事件,和抗凝剂的使用,获得PE和非PE分类精度的80%几何平均值。虽然入院时只有4%(1942/46,639)的患者诊断为PE,我们确定了2个包含亚组的聚类方案,其中超过61%(聚类方案1中的705/1120;聚类方案2中的427/701和340/549)的PE阳性患者.第一聚类方案中的一个亚组包括36%(705/1942)的所有PE患者,其特征是过去明确的PE诊断。深静脉血栓形成的患病率高6倍,肺炎的患病率高3倍,与该方案中其他亚组的患者进行比较。在第二种聚类方案中,2个亚组(仅男性中的1个,仅女性中的1个)包括所有患有PE且肺炎患病率相对较高的患者。第三个亚组仅包括那些过去诊断为肺炎的患者.
    结论:这项研究建立了一种ML工具,用于在入院后几乎立即早期诊断PE。尽管高度不平衡的情况破坏了准确的PE预测,并使用仅来自患者病史的信息,我们的模型既准确又翔实,能够在入院时识别已经处于PE高风险的患者,甚至在进行初始临床检查之前.事实上,根据以前发表的量表,我们没有将我们的患者限制在PE高危人群中(例如,Wells或修订的Genova评分)使我们能够准确评估ML在原始医疗数据上的应用,并确定新的,先前未识别的PE风险因素,比如以前的肺部疾病,在一般人群中。
    BACKGROUND: Under- or late identification of pulmonary embolism (PE)-a thrombosis of 1 or more pulmonary arteries that seriously threatens patients\' lives-is a major challenge confronting modern medicine.
    OBJECTIVE: We aimed to establish accurate and informative machine learning (ML) models to identify patients at high risk for PE as they are admitted to the hospital, before their initial clinical checkup, by using only the information in their medical records.
    METHODS: We collected demographics, comorbidities, and medications data for 2568 patients with PE and 52,598 control patients. We focused on data available prior to emergency department admission, as these are the most universally accessible data. We trained an ML random forest algorithm to detect PE at the earliest possible time during a patient\'s hospitalization-at the time of his or her admission. We developed and applied 2 ML-based methods specifically to address the data imbalance between PE and non-PE patients, which causes misdiagnosis of PE.
    RESULTS: The resulting models predicted PE based on age, sex, BMI, past clinical PE events, chronic lung disease, past thrombotic events, and usage of anticoagulants, obtaining an 80% geometric mean value for the PE and non-PE classification accuracies. Although on hospital admission only 4% (1942/46,639) of the patients had a diagnosis of PE, we identified 2 clustering schemes comprising subgroups with more than 61% (705/1120 in clustering scheme 1; 427/701 and 340/549 in clustering scheme 2) positive patients for PE. One subgroup in the first clustering scheme included 36% (705/1942) of all patients with PE who were characterized by a definite past PE diagnosis, a 6-fold higher prevalence of deep vein thrombosis, and a 3-fold higher prevalence of pneumonia, compared with patients of the other subgroups in this scheme. In the second clustering scheme, 2 subgroups (1 of only men and 1 of only women) included patients who all had a past PE diagnosis and a relatively high prevalence of pneumonia, and a third subgroup included only those patients with a past diagnosis of pneumonia.
    CONCLUSIONS: This study established an ML tool for early diagnosis of PE almost immediately upon hospital admission. Despite the highly imbalanced scenario undermining accurate PE prediction and using information available only from the patient\'s medical history, our models were both accurate and informative, enabling the identification of patients already at high risk for PE upon hospital admission, even before the initial clinical checkup was performed. The fact that we did not restrict our patients to those at high risk for PE according to previously published scales (eg, Wells or revised Genova scores) enabled us to accurately assess the application of ML on raw medical data and identify new, previously unidentified risk factors for PE, such as previous pulmonary disease, in general populations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:为成人心脏手术后的手术结果建立全面的质量保证模型。
    方法:基于在3个高性能医院系统的19家医院中进行的52,792例成人心脏手术,建立手术死亡率模型(n=1,271),中风(n=895),胸骨深部伤口感染(n=122),长时间插管(6,182),肾功能衰竭(1,265),术后住院时间延长(n=5,418),和再操作(n=1693)。随机森林分位数分类,一种为罕见事件的挑战量身定制的方法,和无模型变量优先级筛选用于确定事件的预测因子.
    结果:一组术前变量足以模拟几乎所有心脏手术的手术结果,包括老年;晚期症状;左心室,肺,肾,和肝功能障碍;较低的白蛋白;较高的敏锐度;计划手术的复杂性更高。几何平均性能范围从.63到.76。校准覆盖了大范围的概率。持续的风险因素提供了很高的信息含量,它们与结果的关联用部分图可视化。这些风险因素在医院的强度和配置上有所不同,根据虚拟(数字)双胞胎框架内的反事实因果推断确定的患者风险,他们的风险调整结局也是如此.
    结论:使用一小部分变量和现代机器学习方法,基于3个示例性医院系统的数据,开发了成人心脏手术后手术死亡率和主要发病率的综合模型.他们提供外科医生,他们的病人,与这些先进的医院系统相比,医院和医院系统具有21世纪的工具,用于评估其风险并提高心脏手术质量。
    OBJECTIVE: The study objective was to develop comprehensive quality assurance models for procedural outcomes after adult cardiac surgery.
    METHODS: Based on 52,792 cardiac operations in adults performed in 19 hospitals of 3 high-performing hospital systems, models were developed for operative mortality (n = 1271), stroke (n = 895), deep sternal wound infection (n = 122), prolonged intubation (6182), renal failure (1265), prolonged postoperative stay (n = 5418), and reoperations (n = 1693). Random forest quantile classification, a method tailored for challenges of rare events, and model-free variable priority screening were used to identify predictors of events.
    RESULTS: A small set of preoperative variables was sufficient to model procedural outcomes for virtually all cardiac operations, including older age; advanced symptoms; left ventricular, pulmonary, renal, and hepatic dysfunction; lower albumin; higher acuity; and greater complexity of the planned operation. Geometric mean performance ranged from .63 to .76. Calibration covered large areas of probability. Continuous risk factors provided high information content, and their association with outcomes was visualized with partial plots. These risk factors differed in strength and configuration among hospitals, as did their risk-adjusted outcomes according to patient risk as determined by counterfactual causal inference within a framework of virtual (digital) twins.
    CONCLUSIONS: By using a small set of variables and contemporary machine-learning methods, comprehensive models for procedural operative mortality and major morbidity after adult cardiac surgery were developed based on data from 3 exemplary hospital systems. They provide surgeons, their patients, and hospital systems with 21st century tools for assessing their risks compared with these advanced hospital systems and improving cardiac surgery quality.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:结核性脊柱炎(TS),通常被称为波特病,是一种严重的骨骼结核,通常需要手术治疗。然而,由于住院时间延长(PLOS),这种治疗方案导致医疗费用增加.因此,确定与扩展PLOS相关的危险因素是必要的。在这项研究中,我们打算开发一个可解释的机器学习模型,可以预测扩展的PLOS,这可以为治疗提供有价值的见解,并实现了基于Web的应用程序。
    方法:我们从我院脊柱外科获得患者数据。延长术后住院时间(PLOS)是指脊柱手术后的住院时间等于或超过第75百分位数。要识别相关变量,我们采用了几种方法,例如最小绝对收缩和选择运算符(LASSO),基于支持向量机分类(SVC)的递归特征消除(RFE),相关分析,和排列重要性值。使用实现的几种模型,其中一些使用软投票技术进行集成。使用网格搜索与嵌套交叉验证构建模型。通过各种指标评估每种算法的性能,包括AUC值(接受者操作特征的曲线下面积)和Brier评分。模型解释涉及利用Shapley加法解释(SHAP)等方法,基尼杂质指数,排列重要性,和本地可解释的模型不可知解释(LIME)。此外,为了便于模型的实际应用,开发并部署了基于Web的界面。
    结果:该研究包括580名患者的队列,11个特征包括(CRP,输血,输液量,失血,X线骨桥,X线骨赘,CT-椎体破坏,CT-椎旁脓肿,MRI-椎旁脓肿,MRI-硬膜外脓肿,术后引流)。大多数分类器表现出更好的性能,其中XGBoost模型具有较高的AUC值(0.86)和较低的Brier评分(0.126)。选择XGBoost模型作为最优模型。从校准和决策曲线分析(DCA)图获得的结果表明,XGBoost已取得了有希望的性能。在进行了十倍交叉验证后,XGBoost模型显示平均AUC为0.85±0.09。SHAP和LIME用于显示变量对预测值的贡献。堆叠条形图表明,输液量是主要贡献者,由基尼决定,排列重要性(PFI),和LIME算法。
    结论:我们的方法不仅有效地预测了延长的PLOS,而且还确定了可用于未来治疗的风险因素。本研究中开发的XGBoost模型可通过部署的Web应用程序轻松访问,并有助于临床研究。
    BACKGROUND: Tuberculosis spondylitis (TS), commonly known as Pott\'s disease, is a severe type of skeletal tuberculosis that typically requires surgical treatment. However, this treatment option has led to an increase in healthcare costs due to prolonged hospital stays (PLOS). Therefore, identifying risk factors associated with extended PLOS is necessary. In this research, we intended to develop an interpretable machine learning model that could predict extended PLOS, which can provide valuable insights for treatments and a web-based application was implemented.
    METHODS: We obtained patient data from the spine surgery department at our hospital. Extended postoperative length of stay (PLOS) refers to a hospitalization duration equal to or exceeding the 75th percentile following spine surgery. To identify relevant variables, we employed several approaches, such as the least absolute shrinkage and selection operator (LASSO), recursive feature elimination (RFE) based on support vector machine classification (SVC), correlation analysis, and permutation importance value. Several models using implemented and some of them are ensembled using soft voting techniques. Models were constructed using grid search with nested cross-validation. The performance of each algorithm was assessed through various metrics, including the AUC value (area under the curve of receiver operating characteristics) and the Brier Score. Model interpretation involved utilizing methods such as Shapley additive explanations (SHAP), the Gini Impurity Index, permutation importance, and local interpretable model-agnostic explanations (LIME). Furthermore, to facilitate the practical application of the model, a web-based interface was developed and deployed.
    RESULTS: The study included a cohort of 580 patients and 11 features include (CRP, transfusions, infusion volume, blood loss, X-ray bone bridge, X-ray osteophyte, CT-vertebral destruction, CT-paravertebral abscess, MRI-paravertebral abscess, MRI-epidural abscess, postoperative drainage) were selected. Most of the classifiers showed better performance, where the XGBoost model has a higher AUC value (0.86) and lower Brier Score (0.126). The XGBoost model was chosen as the optimal model. The results obtained from the calibration and decision curve analysis (DCA) plots demonstrate that XGBoost has achieved promising performance. After conducting tenfold cross-validation, the XGBoost model demonstrated a mean AUC of 0.85 ± 0.09. SHAP and LIME were used to display the variables\' contributions to the predicted value. The stacked bar plots indicated that infusion volume was the primary contributor, as determined by Gini, permutation importance (PFI), and the LIME algorithm.
    CONCLUSIONS: Our methods not only effectively predicted extended PLOS but also identified risk factors that can be utilized for future treatments. The XGBoost model developed in this study is easily accessible through the deployed web application and can aid in clinical research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    生物学中的许多问题都需要大海捞针,“对应于二元分类,其中在一组更大的负面因素中有一些正面因素,这被称为阶级不平衡。据报道,接受者工作特征(ROC)曲线和相关的曲线下面积(AUC)不适用于评估不平衡问题的预测性能,在这些问题上,对积极少数类别的性能更感兴趣。而精度-召回率(PR)曲线更可取。我们通过模拟和真实案例研究表明,这是对ROC和PR空间之间差异的误解,表明ROC曲线对类不平衡是稳健的,而PR曲线对类不平衡高度敏感。此外,我们表明,类不平衡不容易从通过PR-AUC测量的分类器性能中分离出来。
    Many problems in biology require looking for a \"needle in a haystack,\" corresponding to a binary classification where there are a few positives within a much larger set of negatives, which is referred to as a class imbalance. The receiver operating characteristic (ROC) curve and the associated area under the curve (AUC) have been reported as ill-suited to evaluate prediction performance on imbalanced problems where there is more interest in performance on the positive minority class, while the precision-recall (PR) curve is preferable. We show via simulation and a real case study that this is a misinterpretation of the difference between the ROC and PR spaces, showing that the ROC curve is robust to class imbalance, while the PR curve is highly sensitive to class imbalance. Furthermore, we show that class imbalance cannot be easily disentangled from classifier performance measured via PR-AUC.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号