gradient boosting

梯度提升
  • 文章类型: Journal Article
    再次入住冠心病监护病房(CCU)对患者预后和医疗保健支出具有重大影响,强调准确识别高再入院风险患者的紧迫性。本研究旨在使用机器学习(ML)算法在多家医院中构建并外部验证CCU再入院的预测模型。
    患者信息,包括人口统计,病史,和实验室测试结果是从电子健康记录系统收集的,总共有40个特征。五个ML模型:逻辑回归,随机森林,支持向量机,梯度增强,和多层感知器用于估计再入院风险。
    选择的梯度增强模型显示出优越的性能,在内部验证集中,接收器工作特征曲线(AUC)下的面积为0.887。在坚持测试集和其他三个医疗中心的进一步外部验证维护了模型的稳健性和一致的高AUC,范围从0.852到0.879。
    结果支持ML算法在医疗保健中的整合,以增强患者风险分层,潜在的优化临床干预措施,减轻CCU再入院的负担。
    UNASSIGNED: Readmission to the coronary care unit (CCU) has significant implications for patient outcomes and healthcare expenditure, emphasizing the urgency to accurately identify patients at high readmission risk. This study aims to construct and externally validate a predictive model for CCU readmission using machine learning (ML) algorithms across multiple hospitals.
    UNASSIGNED: Patient information, including demographics, medical history, and laboratory test results were collected from electronic health record system and contributed to a total of 40 features. Five ML models: logistic regression, random forest, support vector machine, gradient boosting, and multilayer perceptron were employed to estimate the readmission risk.
    UNASSIGNED: The gradient boosting model was selected demonstrated superior performance with an area under the receiver operating characteristic curve (AUC) of 0.887 in the internal validation set. Further external validation in hold-out test set and three other medical centers upheld the model\'s robustness with consistent high AUCs, ranging from 0.852 to 0.879.
    UNASSIGNED: The results endorse the integration of ML algorithms in healthcare to enhance patient risk stratification, potentially optimizing clinical interventions, and diminishing the burden of CCU readmissions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    宫颈癌是女性最危险的恶性肿瘤之一。通过疾病的早期识别和有效治疗的突破,可以延长生存时间。现有的方法在找到预测生存结果的重要属性方面滞后。这项研究的主要目的是通过预测生存率来发现宫颈癌患者因复发而死亡的风险更大。提出的技术中的一种新方法是三角测量特征的重要性,以找到重要的危险因素,通过这些危险因素,治疗可能会有所不同以改善生存结果。五种算法支持向量机,天真的贝叶斯,监督逻辑回归,决策树算法,梯度提升,和随机森林用于构建概念。传统的属性选择方法,如信息增益(IG),FCBF,和ReliefFare受雇。推荐的分类器将评估精度,回想一下,F1,马修斯相关系数(MCC),分类精度(CA),和使用各种方法的曲线下面积(AUC)。梯度增强算法(CATBOOST)预测复发宫颈癌患者生存结局的准确度最高,为0.99。该研究的拟议结果是确定患者生存结果改善的重要风险因素。
    Cervical cancer is one of the most dangerous malignancies in women. Prolonged survival times are made possible by breakthroughs in early recognition and efficient treatment of a disease.The existing methods are lagging on finding the important attributes to predict the survival outcome. The main objective of this study is to find individuals with cervical cancer who are at greater risk of death from recurrence by predicting the survival.A novel approach in a proposed technique is Triangulating feature importance to find the important risk factors through which the treatment may vary to improve the survival outcome.Five algorithms Support vector machine, Naive Bayes, supervised logistic regression, decision tree algorithm, Gradient boosting, and random forest are used to build the concept. Conventional attribute selection methods like information gain (IG), FCBF, and ReliefFare employed. The recommended classifier is evaluated for Precision, Recall, F1, Mathews Correlation Coefficient (MCC), Classification Accuracy (CA), and Area under curve (AUC) using various methods. Gradient boosting algorithm (CAT BOOST) attains the highest accuracy value of 0.99 to predict survival outcome of recurrence cervical cancer patients. The proposed outcome of the research is to identify the important risk factors through which the survival outcome of the patients improved.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究的重点是开发机器学习模型,以检测铁引起的肝细胞染色质组织的细微变化(II,III)氧化物纳米颗粒暴露,假设暴露会显著改变染色质质地。分析了来自小鼠肝脏组织的总共2000个感兴趣的肝细胞核区域(ROI),对于每个ROI,计算了5个不同的参数:长期强调,短期运行强调,行程长度不均匀性,和离散小波变换后得到的2个小波系数能量。这些参数作为监督机器学习模型的输入,特别是随机森林和梯度提升分类器。模型在区分属于暴露于IONP的组的肝细胞染色质结构与对照方面表现出相对稳健的性能。研究结果表明,氧化铁纳米颗粒诱导肝细胞染色质分布的实质性变化,并强调了AI技术在生理和病理条件下推进肝细胞评估的潜力。
    This study focuses on developing machine learning models to detect subtle alterations in hepatocyte chromatin organization due to Iron (II, III) oxide nanoparticle exposure, hypothesizing that exposure will significantly alter chromatin texture. A total of 2000 hepatocyte nuclear regions of interest (ROIs) from mouse liver tissue were analyzed, and for each ROI, 5 different parameters were calculated: Long Run Emphasis, Short Run Emphasis, Run Length Nonuniformity, and 2 wavelet coefficient energies obtained after the discrete wavelet transform. These parameters served as input for supervised machine learning models, specifically random forest and gradient boosting classifiers. The models demonstrated relatively robust performance in distinguishing hepatocyte chromatin structures belonging to the group exposed to IONPs from the controls. The study\'s findings suggest that iron oxide nanoparticles induce substantial changes in hepatocyte chromatin distribution and underscore the potential of AI techniques in advancing hepatocyte evaluation in physiological and pathological conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目前癌症研究领域正在取得重大进展,尤其是在德国,关于癌症登记和医疗信息系统的使用。这种系统的使用大大有助于质量保证和提高数据评估的效率。人工智能(AI)在癌症研究中的重要性日益明显,因为这些系统将AI用于各种目的。即协助用户进行数据分析。本文使用集成学习对医疗信息系统CARESS的图形用户界面状态进行分类。结果表明,使用的所有集成学习模型都取得了良好的性能。特别是,梯度增强算法表现最好,准确率为97%。结果为进一步发展医学数据分析中的集成学习提供了起点,与集成到各种应用程序,如推荐系统的潜力。
    Significant developments are currently underway in the field of cancer research, particularly in Germany, regarding cancer registration and the use of medical information systems. The use of such systems contributes significantly to quality assurance and increased efficiency in data evaluation. The growing importance of artificial intelligence (AI) in cancer research is evident as these systems integrate AI for various purposes, i.e. to assist users in data analysis. This paper uses ensemble learning to classify the graphical user interface state of the medical information system CARESS. The results show that all ensemble learning models utilized achieved good performance. In particular, the gradient boosting algorithm performed the best with an accuracy of 97%. The results represent a starting point for further development of ensemble learning in medical data analysis, with the potential for integration into various applications such as recommender systems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    将机器学习技术应用于不平衡的数据集对材料科学提出了重大挑战,因为少数群体的代表性不足的特征通常被大多数群体中大量无关的特征所掩盖。解决此问题的现有方法集中于使用过采样或合成数据生成技术来平衡每个类的计数。然而,这些方法可能导致有价值的信息丢失或过拟合。这里,我们引入了一个深度学习框架来预测少数民族类材料,特别是在金属-绝缘体过渡(MIT)材料领域内。拟议的方法,称为提升CGCNN,将晶体图卷积神经网络(CGCNN)模型与梯度提升算法相结合。该模型通过依次构建更深入的神经网络,有效地处理了MIT材料数据中的极端类失衡。比较评估表明,与其他方法相比,所提出的模型具有出色的性能。我们的方法是处理材料科学中不平衡数据集的有前途的解决方案。
    Applying machine-learning techniques for imbalanced data sets presents a significant challenge in materials science since the underrepresented characteristics of minority classes are often buried by the abundance of unrelated characteristics in majority of classes. Existing approaches to address this focus on balancing the counts of each class using oversampling or synthetic data generation techniques. However, these methods can lead to loss of valuable information or overfitting. Here, we introduce a deep learning framework to predict minority-class materials, specifically within the realm of metal-insulator transition (MIT) materials. The proposed approach, termed boosting-CGCNN, combines the crystal graph convolutional neural network (CGCNN) model with a gradient-boosting algorithm. The model effectively handled extreme class imbalances in MIT material data by sequentially building a deeper neural network. The comparative evaluations demonstrated the superior performance of the proposed model compared to other approaches. Our approach is a promising solution for handling imbalanced data sets in materials science.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    许多高血压患者仍未被诊断。我们旨在使用瑞典初级保健中流行的电子病历中的诊断代码来开发高血压的预测模型。
    这项性别和年龄相匹配的病例对照(1:5)研究包括居住在斯德哥尔摩地区的30-65岁患者,瑞典,在2010-19年期间新记录的高血压诊断(病例)和在2010-19年期间没有记录的高血压诊断的个体(对照),总计507,618人。排除诊断为心血管疾病或糖尿病的患者。在高血压诊断之前的三年中,使用来自初级保健的1,309个最注册的ICD-10代码构建了随机梯度增强机器学习模型。
    该模型显示,女性的曲线下面积(95%置信区间)为0.748(0.742-0.753),男性为0.745(0.740-0.751),用于预测三年内的高血压诊断。灵敏度分别为63%和68%,特异性为76%和73%,对于女性和男性,分别。对女性和男性模型贡献最大的25个诊断均表现出>1%的归一化相对影响。对模型贡献最大的代码,所有男女的边际效应比值比都>1,是血脂异常,肥胖,在其他情况下遇到卫生服务。
    这个机器学习模型,使用初级卫生保健中普遍记录的诊断,可能有助于识别有未识别的高血压风险的患者。这种预测模型超出血压信息的附加价值值得进一步研究。
    UNASSIGNED: Many individuals with hypertension remain undiagnosed. We aimed to develop a predictive model for hypertension using diagnostic codes from prevailing electronic medical records in Swedish primary care.
    UNASSIGNED: This sex- and age-matched case-control (1:5) study included patients aged 30-65 years living in the Stockholm Region, Sweden, with a newly recorded diagnosis of hypertension during 2010-19 (cases) and individuals without a recorded hypertension diagnosis during 2010-19 (controls), in total 507,618 individuals. Patients with diagnoses of cardiovascular diseases or diabetes were excluded. A stochastic gradient boosting machine learning model was constructed using the 1,309 most registered ICD-10 codes from primary care for three years prior the hypertension diagnosis.
    UNASSIGNED: The model showed an area under the curve (95 % confidence interval) of 0.748 (0.742-0.753) for females and 0.745 (0.740-0.751) for males for predicting diagnosis of hypertension within three years. The sensitivity was 63 % and 68 %, and the specificity 76 % and 73 %, for females and males, respectively. The 25 diagnoses that contributed the most to the model for females and males all exhibited a normalized relative influence >1 %. The codes contributing most to the model, all with an odds ratio of marginal effects >1 for both sexes, were dyslipidaemia, obesity, and encountering health services in other circumstances.
    UNASSIGNED: This machine learning model, using prevailing recorded diagnoses within primary health care, may contribute to the identification of patients at risk of unrecognized hypertension. The added value of this predictive model beyond information of blood pressure warrants further study.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:在台湾队列中开发和验证用于预测冠状动脉疾病(CAD)的机器学习模型,重点是识别重要的预测因素,并比较各种模型的性能。
    方法:本研究对临床、人口统计学,以及来自台湾生物银行(TWB)的8,495名受试者的实验室数据,经过倾向评分匹配以解决潜在的混杂因素。关键变量包括年龄,性别,血脂谱(T-CHO,HDL_C,LDL_C,TG),吸烟和饮酒习惯,以及肾功能和肝功能标志物。评估了多个机器学习模型的性能。
    结果:该队列包括通过自我报告问卷确定的1,699名CAD患者。在人口统计学和临床特征方面,CAD和非CAD个体之间观察到显着差异。值得注意的是,梯度提升模型是最准确的,达到0.846的AUC(95%置信区间[CI]0.819-0.873),灵敏度为0.776(95%CI,0.732-0.820),特异性为0.759(95%CI,0.736-0.782),分别。准确性为0.762(95%CI,0.742-0.782)。年龄被确定为研究数据集中CAD风险的最有影响力的预测因子。
    结论:梯度提升机器学习模型在台湾队列中预测CAD方面表现出卓越的性能,年龄是一个关键的预测指标。这些发现强调了机器学习模型在提高CAD预测精度方面的潜力。从而支持早期发现和有针对性的干预策略。
    背景:不适用。
    OBJECTIVE: To develop and validate machine learning models for predicting coronary artery disease (CAD) within a Taiwanese cohort, with an emphasis on identifying significant predictors and comparing the performance of various models.
    METHODS: This study involved a comprehensive analysis of clinical, demographic, and laboratory data from 8,495 subjects in Taiwan Biobank (TWB) after propensity score matching to address potential confounding factors. Key variables included age, gender, lipid profiles (T-CHO, HDL_C, LDL_C, TG), smoking and alcohol consumption habits, and renal and liver function markers. The performance of multiple machine learning models was evaluated.
    RESULTS: The cohort comprised 1,699 individuals with CAD identified through self-reported questionnaires. Significant differences were observed between CAD and non-CAD individuals regarding demographics and clinical features. Notably, the Gradient Boosting model emerged as the most accurate, achieving an AUC of 0.846 (95% confidence interval [CI] 0.819-0.873), sensitivity of 0.776 (95% CI, 0.732-0.820), and specificity of 0.759 (95% CI, 0.736-0.782), respectively. The accuracy was 0.762 (95% CI, 0.742-0.782). Age was identified as the most influential predictor of CAD risk within the studied dataset.
    CONCLUSIONS: The Gradient Boosting machine learning model demonstrated superior performance in predicting CAD within the Taiwanese cohort, with age being a critical predictor. These findings underscore the potential of machine learning models in enhancing the prediction accuracy of CAD, thereby supporting early detection and targeted intervention strategies.
    BACKGROUND: Not applicable.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:Takotsub综合征(TTS)是一种急性心力衰竭综合征,其症状与急性心肌梗死相似。TTS通常由急性情绪或身体压力引发,并且是发病和死亡的重要原因。TS患者死亡率的预测因素还没有得到很好的理解,并且有必要识别高风险患者并相应地定制治疗。这项研究旨在评估各种临床因素在使用机器学习算法预测TTS患者30天死亡率中的重要性。
    方法:我们分析了2015年至2022年瑞典所有TTS患者的瑞典全国冠状动脉造影和血管成形术登记处(SCAAR)的数据。梯度增强用于评估变量在预测TTS患者30天死亡率中的相对重要性。
    结果:在3,180例TTS住院患者中,76.0%是女性。中位年龄为71.0岁(四分位距62-77)。30天时的粗全因死亡率为3.2%。通过梯度提升的机器学习算法将治疗医院确定为30天死亡率的最重要预测指标。血管造影的临床指征具有重要意义,肌酐水平,基利普班,和年龄。其他不太重要的因素包括体重,高度,以及某些医疗状况,如高脂血症和吸烟状况。
    结论:使用具有梯度提升的机器学习,我们分析了所有诊断为TTS超过7年的瑞典患者,发现治疗医院是30日死亡率的最重要预测因素.
    BACKGROUND: Takotsubo syndrome (TTS) is an acute heart failure syndrome with symptoms similar to acute myocardial infarction. TTS is often triggered by acute emotional or physical stress and is a significant cause of morbidity and mortality. Predictors of mortality in patients with TS are not well understood, and there is a need to identify high-risk patients and tailor treatment accordingly. This study aimed to assess the importance of various clinical factors in predicting 30-day mortality in TTS patients using a machine learning algorithm.
    METHODS: We analyzed data from the nationwide Swedish Coronary Angiography and Angioplasty Registry (SCAAR) for all patients with TTS in Sweden between 2015 and 2022. Gradient boosting was used to assess the relative importance of variables in predicting 30-day mortality in TTS patients.
    RESULTS: Of 3,180 patients hospitalized with TTS, 76.0% were women. The median age was 71.0 years (interquartile range 62-77). The crude all-cause mortality rate was 3.2% at 30 days. Machine learning algorithms by gradient boosting identified treating hospitals as the most important predictor of 30-day mortality. This factor was followed in significance by the clinical indication for angiography, creatinine level, Killip class, and age. Other less important factors included weight, height, and certain medical conditions such as hyperlipidemia and smoking status.
    CONCLUSIONS: Using machine learning with gradient boosting, we analyzed all Swedish patients diagnosed with TTS over seven years and found that the treating hospital was the most significant predictor of 30-day mortality.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:糖尿病的早期诊断至关重要。利用机器学习(ML)创建预测模型,以识别初级卫生保健(PHC)中的新糖尿病病例。
    方法:一项病例对照研究,利用PHC访视的性别数据,年龄,和PHC匹配的对照。在索引(诊断)日期和咨询次数之前的一年中,随机梯度增强用于基于PHC咨询的诊断代码构建预测糖尿病病例的模型。使用归一化相对影响(NRI)评分估计变量重要性。使用边际效应比值比(ORME)计算患糖尿病的风险。按年龄和性别进行了四组研究,男女年龄35-64岁和≥65岁,分别。
    结果:最重要的预测因素是NRI为21.4-29.7%的高血压,和肥胖4.8-15.2%。其他十大诊断和管理代码的NRI通常在1.0-4.2%之间。
    结论:我们的数据证实了预测糖尿病新诊断的已知风险模式,需要经常检测血糖。为了在临床实践中评估ML用于风险预测目的的全部潜力,未来的研究可能包括生活方式模式的临床数据,实验室检查和处方药。
    OBJECTIVE: It is crucial to identify a diabetes diagnosis early. Create a predictive model utilizing machine learning (ML) to identify new cases of diabetes in primary health care (PHC).
    METHODS: A case-control study utilizing data on PHC visits for sex-, age, and PHC-matched controls. Stochastic gradient boosting was used to construct a model for predicting cases of diabetes based on diagnostic codes from PHC consultations during the year before index (diagnosis) date and number of consultations. Variable importance was estimated using the normalized relative influence (NRI) score. Risks of having diabetes were calculated using odds ratios of marginal effects (ORME). Four groups by age and sex were studied, age-groups 35-64 years and ≥ 65 years in men and women, respectively.
    RESULTS: The most important predictive factors were hypertension with NRI 21.4-29.7 %, and obesity 4.8-15.2 %. The NRI for other top ten diagnoses and administrative codes generally ranged 1.0-4.2 %.
    CONCLUSIONS: Our data confirm the known risk patterns for predicting a new diagnosis of diabetes, and the need to test blood glucose frequently. To assess the full potential of ML for risk prediction purposes in clinical practice, future studies could include clinical data on life-style patterns, laboratory tests and prescribed medication.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    用于预测变体效果的大多数算法都依赖于进化保守性。然而,大多数这样的技术通过仅使用多个序列的比对同时忽略替换事件的进化背景来计算进化保守性。我们引入了PHACT,基于评分的致病性预测错义突变,可以利用系统发育树,在我们之前的研究中。通过建立在这个基础上,我们现在提议PHACTboost,基于梯度提升树的分类器,将PHACT分数与来自多个序列比对的信息相结合,系统发育树,和祖先重建。通过从数据中学习,PHACTboost优于PHACT。此外,对精心构建的变种集进行的综合实验结果表明,PHACTboost可以胜过dbNSFP中报道的40种流行的致病性预测因子,包括传统工具,元预测因子,和基于深度学习的方法以及最新的工具,例如,AlphaMissense,EVE,和CPT-1。PHACTboost优于这些方法在硬变体的情况下尤其明显,不同的致病性预测因子提供了相互矛盾的结果。我们提供了对20,191种蛋白质的2.15亿个氨基酸改变的预测。PHACTboost可在https://github.com/CompGenomeLab/PHACTboost获得。PHACTboost可以提高我们对遗传疾病的理解,并促进更准确的诊断。
    Most algorithms that are used to predict the effects of variants rely on evolutionary conservation. However, a majority of such techniques compute evolutionary conservation by solely using the alignment of multiple sequences while overlooking the evolutionary context of substitution events. We had introduced PHACT, a scoring-based pathogenicity predictor for missense mutations that can leverage phylogenetic trees, in our previous study. By building on this foundation, we now propose PHACTboost, a gradient boosting tree-based classifier that combines PHACT scores with information from multiple sequence alignments, phylogenetic trees, and ancestral reconstruction. By learning from data, PHACTboost outperforms PHACT. Furthermore, the results of comprehensive experiments on carefully constructed sets of variants demonstrated that PHACTboost can outperform 40 prevalent pathogenicity predictors reported in the dbNSFP, including conventional tools, metapredictors, and deep learning-based approaches as well as more recent tools such as AlphaMissense, EVE, and CPT-1. The superiority of PHACTboost over these methods was particularly evident in case of hard variants for which different pathogenicity predictors offered conflicting results. We provide predictions of 215 million amino acid alterations over 20,191 proteins. PHACTboost is available at https://github.com/CompGenomeLab/PHACTboost. PHACTboost can improve our understanding of genetic diseases and facilitate more accurate diagnoses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号