Boruta

Boruta
  • 文章类型: Journal Article
    目的:使用可解释的机器学习方法,基于Caprini量表确定泌尿外科住院患者静脉血栓栓塞症(VTE)的关键危险因素。
    方法:根据病例医院Caprini量表获得泌尿科住院患者的VTE风险数据。根据数据,使用Boruta方法从Caprini量表的37个变量中进一步选择关键变量。此外,使用粗糙集(RS)方法生成与每个风险级别相对应的决策规则。最后,随机森林(RF),支持向量机(SVM),和反向传播人工神经网络(BPANN)验证了数据的准确性,并与RS方法进行了比较。
    结果:筛选后,泌尿外科静脉血栓栓塞的关键危险因素是“(C1)年龄,\“\”(C2)计划的小手术,\“\”(C3)肥胖(BMI>25),\"\"(C8)静脉曲张,\“\”(C9)脓毒症(<1个月),“(C10)”严重肺部疾病,包括。肺炎(<1个月)“(C11)COPD,\“\”(C16)其他风险,\“\”(C18)大手术(>45分钟),\“\”(C19)腹腔镜手术(>45分钟),\“\”(C20)患者卧床(>72小时),\“\”(C18)恶性肿瘤(现在或以前),\"\"(C23)中心静脉通路,“”(C31)DVT/PE的历史,\“\”(C32)其他先天性或获得性血栓形成倾向,“和”(C34)中风(<1个月。“根据RS方法得到的不同风险等级的决策规则,“(C1)年龄,\"\"(C18)大手术(>45分钟),“和”(C21)恶性肿瘤(现在或以前)“是影响中高风险水平的主要因素,并根据这三个因素提出了一些预防VTE的建议。RS的平均准确度,射频,SVM,BPANN模型为79.5%,87.9%,92.6%,97.2%,分别。此外,BPANN的准确度最高,召回,F1分数,和精度。
    结论:与其他三种常见的机器学习模型相比,RS模型的准确性较差。然而,RS模型提供了很强的可解释性,并允许识别影响泌尿外科VTE高风险评估的高危因素和决策规则.这种透明度对于临床医生在风险评估过程中非常重要。
    OBJECTIVE: To identify the key risk factors for venous thromboembolism (VTE) in urological inpatients based on the Caprini scale using an interpretable machine learning method.
    METHODS: VTE risk data of urological inpatients were obtained based on the Caprini scale in the case hospital. Based on the data, the Boruta method was used to further select the key variables from the 37 variables in the Caprini scale. Furthermore, decision rules corresponding to each risk level were generated using the rough set (RS) method. Finally, random forest (RF), support vector machine (SVM), and backpropagation artificial neural network (BPANN) were used to verify the data accuracy and were compared with the RS method.
    RESULTS: Following the screening, the key risk factors for VTE in urology were \"(C1) Age,\" \"(C2) Minor Surgery planned,\" \"(C3) Obesity (BMI > 25),\" \"(C8) Varicose veins,\" \"(C9) Sepsis (< 1 month),\" (C10) \"Serious lung disease incl. pneumonia (< 1month) \" (C11) COPD,\" \"(C16) Other risk,\" \"(C18) Major surgery (> 45 min),\" \"(C19) Laparoscopic surgery (> 45 min),\" \"(C20) Patient confined to bed (> 72 h),\" \"(C18) Malignancy (present or previous),\" \"(C23) Central venous access,\" \"(C31) History of DVT/PE,\" \"(C32) Other congenital or acquired thrombophilia,\" and \"(C34) Stroke (< 1 month.\" According to the decision rules of different risk levels obtained using the RS method, \"(C1) Age,\" \"(C18) Major surgery (> 45 minutes),\" and \"(C21) Malignancy (present or previous)\" were the main factors influencing mid- and high-risk levels, and some suggestions on VTE prevention were indicated based on these three factors. The average accuracies of the RS, RF, SVM, and BPANN models were 79.5%, 87.9%, 92.6%, and 97.2%, respectively. In addition, BPANN had the highest accuracy, recall, F1-score, and precision.
    CONCLUSIONS: The RS model achieved poorer accuracy than the other three common machine learning models. However, the RS model provides strong interpretability and allows for the identification of high-risk factors and decision rules influencing high-risk assessments of VTE in urology. This transparency is very important for clinicians in the risk assessment process.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    急性肾损伤(AKI)是重症监护病房(ICU)患者最重要的致死因素之一。及时的高危预后评估和干预对改善患者预后至关重要。在这项研究中,使用MIMIC-III数据集和双层特征选择方法建立了一个堆叠模型,用于预测因AKI入院的ICU患者的院内死亡风险.使用单独的MIMIC-IV和eICU-CRD进行外部验证。使用堆叠模型计算曲线下面积(AUC),并使用Boruta和XGBoost特征选择方法选择特征。本研究比较了使用两层特征选择的堆叠模型与使用单层特征选择的模型的性能(XGBoost:85;Boruta:83;两层:0.91)。通过使用不同的数据集(验证1:0.83;验证2:0.85)并将其与更简单的模型和传统的临床评分(SOFA:0.65;APACHIV:0.61)进行比较,进一步验证了堆叠模型的预测有效性。此外,本研究结合了可解释技术和因果推断来分析特征与预测结果之间的因果关系.
    Acute kidney injury (AKI) is one of the most important lethal factors for patients admitted to intensive care units (ICUs), and timely high-risk prognostic assessment and intervention are essential to improving patient prognosis. In this study, a stacking model using the MIMIC-III dataset with a two-tier feature selection approach was developed to predict the risk of in-hospital mortality in ICU patients admitted for AKI. External validation was performed using separate MIMIC-IV and eICU-CRD. The area under the curve (AUC) was calculated using the stacking model, and features were selected using the Boruta and XGBoost feature selection methods. This study compares the performance of a stacking model using two-tier feature selection with a model using single-tier feature selection (XGBoost: 85; Boruta: 83; two-tier: 0.91). The predictive effectiveness of the stacking model was further validated by using different datasets (Validation 1: 0.83; Validation 2: 0.85) and comparing it with a simpler model and traditional clinical scores (SOFA: 0.65; APACH IV: 0.61). In addition, this study combined interpretable techniques and causal inference to analyze the causal relationship between features and predicted outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    天冬酰胺肽裂解酶(APL)是七组蛋白酶之一,也被称为蛋白水解酶,根据其催化残留物进行分类。APL被合成为前体或前肽,其通过自身蛋白水解反应进行自切割。目前,APL分为10个家族,属于6个不同的蛋白酶家族。认识到它们在包括病毒成熟在内的许多生物过程中的关键作用,和毒力,APLs的准确识别和表征是必不可少的。APL的实验鉴定和表征是费力且耗时的。这里,我们开发了APLpred,一种新颖的基于支持向量机(SVM)的预测器,可以从初级序列预测APL。APLpred是使用基于Boruta的最佳特征开发的,该特征来自七个编码,随后使用五个机器学习算法进行训练。在独立数据集上评估每个模型后,我们选择了APLpred(一种基于SVM的模型),因为它在交叉验证和独立评估过程中表现一致.我们预计APLpred将是识别APLs的有效工具。这可以帮助设计针对这些酶的抑制剂并探索它们的功能。APLpred网络服务器可在https://procarb.org/APLpred/免费获得。
    Asparagine peptide lyase (APL) is among the seven groups of proteases, also known as proteolytic enzymes, which are classified according to their catalytic residue. APLs are synthesized as precursors or propeptides that undergo self-cleavage through autoproteolytic reaction. At present, APLs are grouped into 10 families belonging to six different clans of proteases. Recognizing their critical roles in many biological processes including virus maturation, and virulence, accurate identification and characterization of APLs is indispensable. Experimental identification and characterization of APLs is laborious and time-consuming. Here, we developed APLpred, a novel support vector machine (SVM) based predictor that can predict APLs from the primary sequences. APLpred was developed using Boruta-based optimal features derived from seven encodings and subsequently trained using five machine learning algorithms. After evaluating each model on an independent dataset, we selected APLpred (an SVM-based model) due to its consistent performance during cross-validation and independent evaluation. We anticipate APLpred will be an effective tool for identifying APLs. This could aid in designing inhibitors against these enzymes and exploring their functions. The APLpred web server is freely available at https://procarb.org/APLpred/.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:头痛是全球最普遍和致残的健康状况之一。我们使用荷兰基于人群的职业和环境健康队列研究(AMIGO)的数据,前瞻性地探讨了与每周头痛发作有关的城市风险。
    方法:参与者(N=7,339)在2011年和2015年完成了基线和随访问卷,报告了头痛频率。关于城市曝光的信息涵盖了10个领域的80个曝光,比如空气污染,电磁场,生活方式和社会人口特征。我们首先使用Boruta算法识别所有相关的曝光,然后,对于每个单独的曝光,我们通过训练按年龄调整的因果森林来估计平均治疗效果(ATE)和相关标准误差(SE),抑郁症诊断,止痛药的使用,一般健康指标,睡眠障碍指数和基线时每周头痛发作的发生。
    结果:基线时每周头痛发生率为12.5%,随访时发生率为11.1%。Boruta选择了五种空气污染物(NO2,NOX,PM10,PM10中的硅,PM2.5中的铁)和一项城市温度测量(热岛效应)是导致随访时每周头痛发作的因素。每次暴露对每周头痛的估计因果效应表明正相关。NO2显示出最大的影响(ATE=每四分位数间距(IQR)增加0.007;SE=0.004),其次是PM10(每IQR增加ATE=0.006;SE=0.004),热岛效应(ATE=每增加一摄氏度0.006;SE=0.007),NOx(每IQR增加ATE=0.004;SE=0.004),PM2.5中的铁(ATE=0.003每IQR增加;SE=0.004),和PM10中的硅(每IQR增加一次ATE=0.003;SE=0.004)。
    结论:我们的结果表明,暴露于空气污染和热岛效应有助于报告研究人群中每周的头痛发作。
    OBJECTIVE: Headache is one of the most prevalent and disabling health conditions globally. We prospectively explored the urban exposome in relation to weekly occurrence of headache episodes using data from the Dutch population-based Occupational and Environmental Health Cohort Study (AMIGO).
    METHODS: Participants (N = 7,339) completed baseline and follow-up questionnaires in 2011 and 2015, reporting headache frequency. Information on the urban exposome covered 80 exposures across 10 domains, such as air pollution, electromagnetic fields, and lifestyle and socio-demographic characteristics. We first identified all relevant exposures using the Boruta algorithm and then, for each exposure separately, we estimated the average treatment effect (ATE) and related standard error (SE) by training causal forests adjusted for age, depression diagnosis, painkiller use, general health indicator, sleep disturbance index and weekly occurrence of headache episodes at baseline.
    RESULTS: Occurrence of weekly headache was 12.5 % at baseline and 11.1 % at follow-up. Boruta selected five air pollutants (NO2, NOX, PM10, silicon in PM10, iron in PM2.5) and one urban temperature measure (heat island effect) as factors contributing to the occurrence of weekly headache episodes at follow-up. The estimated causal effect of each exposure on weekly headache indicated positive associations. NO2 showed the largest effect (ATE = 0.007 per interquartile range (IQR) increase; SE = 0.004), followed by PM10 (ATE = 0.006 per IQR increase; SE = 0.004), heat island effect (ATE = 0.006 per one-degree Celsius increase; SE = 0.007), NOx (ATE = 0.004 per IQR increase; SE = 0.004), iron in PM2.5 (ATE = 0.003 per IQR increase; SE = 0.004), and silicon in PM10 (ATE = 0.003 per IQR increase; SE = 0.004).
    CONCLUSIONS: Our results suggested that exposure to air pollution and heat island effects contributed to the reporting of weekly headache episodes in the study population.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    构建回归模型以从培养基中的近红外光谱预测葡萄糖和乳酸的浓度。采用偏最小二乘(PLS)回归技术,我们研究了使用波长选择和迁移学习可以实现的PLS模型预测能力的提高。我们结合了Boruta,一种基于随机森林的非线性变量选择方法,在PLS中具有变量的投影重要性(VIP),以产生所提出的变量选择方法,VIP-Boruta.此外,重点关注培养基样品和伪培养基样品都可以使用的情况,我们将伪媒体转移到文化媒体。用培养基和伪培养基的实际数据集进行数据分析,证实VIP-Boruta可以有效选择合适的波长,提高PLS模型的预测能力,伪媒体迁移学习增强了预测能力。所提出的方法可以将葡萄糖的预测误差降低约61%,乳酸的预测误差降低约16%。与传统的PLS模型相比。
    Regression models are constructed to predict glucose and lactate concentrations from near-infrared spectra in culture media. The partial least-squares (PLS) regression technique is employed, and we investigate the improvement in the predictive ability of PLS models that can be achieved using wavelength selection and transfer learning. We combine Boruta, a nonlinear variable selection method based on random forests, with variable importance in projection (VIP) in PLS to produce the proposed variable selection method, VIP-Boruta. Furthermore, focusing on the situation where both culture medium samples and pseudo-culture medium samples can be used, we transfer pseudo media to culture media. Data analysis with an actual dataset of culture media and pseudo media confirms that VIP-Boruta can effectively select appropriate wavelengths and improves the prediction ability of PLS models, and that transfer learning with pseudo media enhances the predictive ability. The proposed method could reduce the prediction errors by about 61% for glucose and about 16% for lactate, compared to the traditional PLS model.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    机器学习方法已用于识别各种表型的组学标记。我们旨在研究有监督的机器学习算法是否可以改善酒精相关转录组标记的识别。在这项研究中,我们分析了基于阵列的,5,508名弗雷明汉心脏研究参与者中17,873个基因转录本的全血衍生表达数据。通过使用Boruta算法,一种基于监督随机森林(RF)的特征选择方法,我们选择了25个与酒精相关的转录本。在测试集中(整个研究参与者的30%),这25种转录物的AUC(接受者工作特征曲线下面积)分别为0.73、0.69和0.66。适度饮酒者,不饮酒者vs.酗酒者,和适度饮酒者vs.酗酒者,分别。通过Boruta方法选择的转录物的AUC与使用常规线性回归模型确定的AUC相当,例如,通过常规线性回归模型(错误发现率<0.05)鉴定的1,985个转录本的AUC分别为0.72、0.68和0.68。用Bonferroni校正25Boruta方法选择的转录本和三个CVD危险因素(即,P<6.7e-4)时,我们观察到13个转录本与肥胖有关,3个2型糖尿病的转录本,和1个患有高血压的转录本。例如,我们观察到饮酒与DOCK4,IL4R的表达呈负相关,SORT1、DOCK4和SORT1与肥胖呈正相关,IL4R与高血压呈负相关。总之,使用有监督的机器学习方法,基于RF的Boruta算法,我们鉴定了新的酒精相关基因转录物。
    Machine learning methods have been used in identifying omics markers for a variety of phenotypes. We aimed to examine whether a supervised machine learning algorithm can improve identification of alcohol-associated transcriptomic markers. In this study, we analysed array-based, whole-blood derived expression data for 17 873 gene transcripts in 5508 Framingham Heart Study participants. By using the Boruta algorithm, a supervised random forest (RF)-based feature selection method, we selected twenty-five alcohol-associated transcripts. In a testing set (30 % of entire study participants), AUC (area under the receiver operating characteristics curve) of these twenty-five transcripts were 0·73, 0·69 and 0·66 for non-drinkers v. moderate drinkers, non-drinkers v. heavy drinkers and moderate drinkers v. heavy drinkers, respectively. The AUC of the selected transcripts by the Boruta method were comparable to those identified using conventional linear regression models, for example, AUC of 1958 transcripts identified by conventional linear regression models (false discovery rate < 0·2) were 0·74, 0·66 and 0·65, respectively. With Bonferroni correction for the twenty-five Boruta method-selected transcripts and three CVD risk factors (i.e. at P < 6·7e-4), we observed thirteen transcripts were associated with obesity, three transcripts with type 2 diabetes and one transcript with hypertension. For example, we observed that alcohol consumption was inversely associated with the expression of DOCK4, IL4R, and SORT1, and DOCK4 and SORT1 were positively associated with obesity, and IL4R was inversely associated with hypertension. In conclusion, using a supervised machine learning method, the RF-based Boruta algorithm, we identified novel alcohol-associated gene transcripts.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    对可可粉的需求不断上升,导致市场价格暴涨,导致旨在实现经济效益的掺假做法的出现。本研究旨在使用可见和近红外光谱(Vis-NIRS)检测和量化可可粉掺假。这项研究中使用的掺假物是角豆粉,可可壳,谷草,大豆,和全麦。使用Savitzky-Golay平滑无法解析NIRS数据。然而,随机森林和支持向量机的应用成功地对样本进行分类,准确率为100%。使用偏最小二乘法(PLS)对掺假进行量化,拉索,里奇,弹性网,RF回归提供的R2高于0.96,均方根误差<2.6。PLS与Boruta算法的耦合产生了最可靠的回归模型(R2=1,RMSE=0.0000)。最后,准备了在线申请,以方便可可粉中掺假物的测定。
    The rising demand for cocoa powder has resulted in an upsurge in market prices, leading to the emergence of adulteration practices aimed at achieving economic benefits. This study aimed to detect and quantify cocoa powder adulteration using visible and near-infrared spectroscopy (Vis-NIRS). The adulterants used in this study were powdered carob, cocoa shell, foxtail millet, soybean, and whole wheat. The NIRS data could not be resolved using Savitzky-Golay smoothing. Nevertheless, the application of a random forest and support vector machine successfully classified the samples with 100% accuracy. Quantification of adulteration using partial least squares (PLS), Lasso, Ridge, elastic Net, and RF regressions provided R2 higher than 0.96 and root mean square error <2.6. Coupling PLS with the Boruta algorithm produced the most reliable regression model (R2 = 1, RMSE = 0.0000). Finally, an online application was prepared to facilitate the determination of adulterants in the cocoa powder.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    可解释的机器学习模型有助于疾病诊断和临床决策,在相关特征上发光。值得注意的是,Boruta,SHAP(Shapley添加剂出口),BorutaShap被用于特征选择,每个都有助于识别关键特征。然后利用这些选定的特征来训练六种机器学习算法,包括LR,SVM,ETC,AdaBoost,射频,和LR,使用经过严格预处理后从公共来源获得的不同医疗数据集。跨多个ML模型评估了每种特征选择技术的性能,评估准确性,精度,召回,和F1分数指标。其中,SHAP展示了卓越的性能,平均准确率达到80.17%,85.13%,90.00%,99.55%的糖尿病患者,心血管,statlog,和甲状腺疾病数据集,分别。值得注意的是,LGBM成为最有效的算法,对于大多数疾病状态,平均准确率为91.00%。此外,SHAP增强了模型的可解释性,为驱动疾病诊断的潜在机制提供有价值的见解。这项全面的研究为疾病诊断的特征选择技术和机器学习算法提供了重要的见解,有利于医学领域的研究人员和从业人员。进一步探索特征选择方法和算法有望推进疾病诊断方法,为更准确和可解释的诊断模型铺平道路。
    Interpretable machine learning models are instrumental in disease diagnosis and clinical decision-making, shedding light on relevant features. Notably, Boruta, SHAP (SHapley Additive exPlanations), and BorutaShap were employed for feature selection, each contributing to the identification of crucial features. These selected features were then utilized to train six machine learning algorithms, including LR, SVM, ETC, AdaBoost, RF, and LR, using diverse medical datasets obtained from public sources after rigorous preprocessing. The performance of each feature selection technique was evaluated across multiple ML models, assessing accuracy, precision, recall, and F1-score metrics. Among these, SHAP showcased superior performance, achieving average accuracies of 80.17%, 85.13%, 90.00%, and 99.55% across diabetes, cardiovascular, statlog, and thyroid disease datasets, respectively. Notably, the LGBM emerged as the most effective algorithm, boasting an average accuracy of 91.00% for most disease states. Moreover, SHAP enhanced the interpretability of the models, providing valuable insights into the underlying mechanisms driving disease diagnosis. This comprehensive study contributes significant insights into feature selection techniques and machine learning algorithms for disease diagnosis, benefiting researchers and practitioners in the medical field. Further exploration of feature selection methods and algorithms holds promise for advancing disease diagnosis methodologies, paving the way for more accurate and interpretable diagnostic models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:2016年,英国有450万糖尿病患者,主要是2型糖尿病(T2DM)。NHS拨款100亿英镑(占其预算的9%)来管理糖尿病。二甲双胍是T2DM的主要治疗药物,但35%的患者并没有从中受益,导致并发症。本研究旨在深入研究二甲双胍的临床疗效,基因组,和蛋白质组数据来发现新的生物标志物,并为早期二甲双胍反应检测构建机器学习预测因子。
    方法:在这里,我们报告了在Altnagelvin地区医院招募的Diastrat队列中使用二甲双胍单药治疗的患者的T2DM数据集的分析,北爱尔兰。
    结果:在临床数据分析中,将响应者(达到HbA1c≤48mmol/mol的人)与非响应者(HbA1c>48mmol/mol)进行比较,我们发现肌酐水平和体重与缓解的负相关程度高于无缓解.在基因组分析中,我们鉴定了统计学显著(p值<0.05)变异rs6551649(LPHN3),rs6551654(LPHN3),rs4495065(LPHN3)和rs7940817(TRPC6)似乎可以区分响应者和非响应者。在蛋白质组学分析中,我们确定了15个具有统计学意义(p值<0.05,q值<0.05)的蛋白质组标志物,响应者,无应答者和治疗组,其中最显著的是具有倍数变化~2的HAOX1、CCL17和PAI。建立了机器学习模型;最佳模型预测非响应者,分类准确率为83%。
    结论:需要在前瞻性验证队列中进行进一步测试,以确定所提出模型的临床实用性。
    BACKGROUND: In 2016, the UK had 4.5 million people with diabetes, predominantly Type-2 Diabetes Mellitus (T2DM). The NHS allocates £10 billion (9% of its budget) to manage diabetes. Metformin is the primary treatment for T2DM, but 35% of patients don\'t benefit from it, leading to complications. This study aims to delve into metformin\'s efficacy using clinical, genomic, and proteomic data to uncover new biomarkers and build a Machine Learning predictor for early metformin response detection.
    METHODS: Here we report analysis from a T2DM dataset of individuals prescribed metformin monotherapy from the Diastrat cohort recruited at the Altnagelvin Area Hospital, Northern Ireland.
    RESULTS: In the clinical data analysis, comparing responders (those achieving HbA1c ≤ 48 mmol/mol) to non-responders (with HbA1c > 48 mmol/mol), we identified that creatinine levels and bodyweight were more negatively correlated with response than non-response. In genomic analysis, we identified statistically significant (p-value <0.05) variants rs6551649 (LPHN3), rs6551654 (LPHN3), rs4495065 (LPHN3) and rs7940817 (TRPC6) which appear to differentiate the responders and non-responders. In proteomic analysis, we identified 15 statistically significant (p-value <0.05, q-value <0.05) proteomic markers that differentiate controls, responders, non-responders and treatment groups, out of which the most significant were HAOX1, CCL17 and PAI that had fold change ∼2. A machine learning model was build; the best model predicted non-responders with 83% classification accuracy.
    CONCLUSIONS: Further testing in prospective validation cohorts is required to determine the clinical utility of the proposed model.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    这项研究使用机器学习算法来识别重要变量并预测接受结直肠肿瘤切除手术的患者的诱导后低血压(PIH)。
    分析了318例全身麻醉下结直肠肿瘤切除术患者的数据。训练集和测试集根据时间线进行划分。利用Boruta算法筛选相关的基本特征变量,建立训练集模型。四个模型,回归树,K-最近邻,神经网络,和随机森林(RF),使用重复交叉验证和超参数优化构建。选择了最好的模型,和特征变量的排序图,单变量部分依赖关系配置文件,并绘制了分解剖面。R2,平均绝对误差(MAE),均方误差(MSE),和根MSE(RMSE)用于绘制训练集和测试集的回归拟合曲线。
    与Boruta筛查相关的基本特征变量是年龄,性别,身体质量指数,L3骨骼肌指数,和HUAC。在最佳射频模型中,训练集和测试集的R2分别为0.7708和0.7591,MAE分别为0.0483和0.0408,MSE分别为0.0038和0.0028,RMSE分别为0.0623和0.0534,分别。
    建立并验证了一种高性能算法,以证明诱导后血压的变化程度,以控制重要的特征变量并减少PIH的发生。
    UNASSIGNED: This study used machine learning algorithms to identify important variables and predict postinduction hypotension (PIH) in patients undergoing colorectal tumor resection surgery.
    UNASSIGNED: Data from 318 patients who underwent colorectal tumor resection under general anesthesia were analyzed. The training and test sets are divided based on the timeline. The Boruta algorithm was used to screen relevant basic characteristic variables and establish a model for the training set. Four models, regression tree, K-nearest neighbor, neural network, and random forest (RF), were built using repeated cross-validation and hyperparameter optimization. The best model was selected, and a sorting chart of the feature variables, a univariate partial dependency profile, and a breakdown profile were drawn. R2, mean absolute error (MAE), mean squared error (MSE), and root MSE (RMSE) were used to plot regression fitting curves for the training and test sets.
    UNASSIGNED: The basic feature variables associated with the Boruta screening were age, sex, body mass index, L3 skeletal muscle index, and HUAC. In the optimal RF model, R2 was 0.7708 and 0.7591, MAE was 0.0483 and 0.0408, MSE was 0.0038 and 0.0028, and RMSE was 0.0623 and 0.0534 for the training and test sets, respectively.
    UNASSIGNED: A high-performance algorithm was established and validated to demonstrate the degree of change in blood pressure after induction to control important characteristic variables and reduce PIH occurrence.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号