CatBoost CatBoost-医云文献数字医云科研云海量医学决策数据服务

CatBoost 关注

CatBoost

文献(48篇)

百科

视频

1 Prediction of Protein-DNA Interface Hot Spots Based on Empirical Mode Decomposition and Machine Learning.

基于经验模式分解和机器学习的蛋白质 - DNA 界面热点预测 [J]. 影响指数 : 4.141
发表时间：May 2024 23
来源期刊：Genes (Basel) PMID：38927611

DOI：10.3390/genes15060676
文章类型： Journal Article

蛋白质-DNA复合物相互作用在基因表达等生物活性中起着至关重要的作用,修改,复制和转录。了解蛋白质-DNA结合界面热点的生理意义，以及计算生物学的发展，取决于这些区域的精确识别。在本文中,提出了一种称为EC-PDH的热点预测方法。首先,我们提取了这些热点的特征\'固体溶剂可及表面积(ASA)和二级结构，然后是意思，方差，通过经验模态分解算法（EMD）提取这些传统特征的前三个固有模态分量（IMFs）的能量和自相关函数值作为新特征。总共获得218个维度特征。对于特征选择，我们使用最大相关最小冗余序列正向选择方法（mRMR-SFS）来获得最佳的11维特征子集。为了解决数据不平衡的问题，我们使用SMOTE-Tomek算法来平衡正负样本,最后使用cat梯度增强(CatBoost)构建蛋白质-DNA结合界面的热点预测模型.我们的方法在测试集上表现良好，AUC,MCC和F1得分值分别为0.847、0.543和0.772。经过比较评估，EC-PDH在识别热点方面优于现有的最先进的方法。
Protein-DNA complex interactivity plays a crucial role in biological activities such as gene expression, modification, replication and transcription. Understanding the physiological significance of protein-DNA binding interfacial hot spots, as well as the development of computational biology, depends on the precise identification of these regions. In this paper, a hot spot prediction method called EC-PDH is proposed. First, we extracted features of these hot spots\' solid solvent-accessible surface area (ASA) and secondary structure, and then the mean, variance, energy and autocorrelation function values of the first three intrinsic modal components (IMFs) of these conventional features were extracted as new features via the empirical modal decomposition algorithm (EMD). A total of 218 dimensional features were obtained. For feature selection, we used the maximum correlation minimum redundancy sequence forward selection method (mRMR-SFS) to obtain an optimal 11-dimensional-feature subset. To address the issue of data imbalance, we used the SMOTE-Tomek algorithm to balance positive and negative samples and finally used cat gradient boosting (CatBoost) to construct our hot spot prediction model for protein-DNA binding interfaces. Our method performs well on the test set, with AUC, MCC and F1 score values of 0.847, 0.543 and 0.772, respectively. After a comparative evaluation, EC-PDH outperforms the existing state-of-the-art methods in identifying hot spots.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
2 Automatic text classification of drug-induced liver injury using document-term matrix and XGBoost.

使用文档项矩阵和 XGBoost 对药物性肝损伤进行自动文本分类。影响指数 : 暂无
发表时间：2024
来源期刊：Front Artif Intell PMID：38887604

DOI：10.3389/frai.2024.1401810
文章类型： Journal Article

■监管机构在审查过程中会产生大量的文本数据。例如,药品标签是监管机构的宝贵资源，如美国食品和药物管理局（FDA）和欧洲医学署（EMA），向医疗保健专业人员和患者传达药物安全性和有效性信息。药物标签也是药物警戒和药物安全性研究的资源。自动文本分类将大大改善药品标签文档的分析并节省审阅者资源。
■我们在这项研究中利用人工智能对基于FDA的DILIrank数据集的药物标签文件中的药物诱导肝损伤（DILI）相关内容进行分类。我们采用了文本挖掘和XGBoost模型，并利用不良事件标准的首选医学查询术语来简化常见单词和短语的消除，同时保留FDA和EMA药物标签数据集的医学标准术语。然后，我们使用通过术语频率-逆文档频率(TF-IDF)为每个包含的单词/术语/标记计算的权重来构建文档术语矩阵。
■自动文本分类模型在预测DILI方面表现出强大的性能，FDA和EMA的药物标签以及海量数据分析关键评估（CAMDA）的文献摘要的交叉验证AUC得分均超过0.90。
■此外，本研究中演示的文本挖掘和XGBoost函数可以应用于其他文本处理和分类任务。
UNASSIGNED: Regulatory agencies generate a vast amount of textual data in the review process. For example, drug labeling serves as a valuable resource for regulatory agencies, such as U.S. Food and Drug Administration (FDA) and Europe Medical Agency (EMA), to communicate drug safety and effectiveness information to healthcare professionals and patients. Drug labeling also serves as a resource for pharmacovigilance and drug safety research. Automated text classification would significantly improve the analysis of drug labeling documents and conserve reviewer resources.
UNASSIGNED: We utilized artificial intelligence in this study to classify drug-induced liver injury (DILI)-related content from drug labeling documents based on FDA\'s DILIrank dataset. We employed text mining and XGBoost models and utilized the Preferred Terms of Medical queries for adverse event standards to simplify the elimination of common words and phrases while retaining medical standard terms for FDA and EMA drug label datasets. Then, we constructed a document term matrix using weights computed by Term Frequency-Inverse Document Frequency (TF-IDF) for each included word/term/token.
UNASSIGNED: The automatic text classification model exhibited robust performance in predicting DILI, achieving cross-validation AUC scores exceeding 0.90 for both drug labels from FDA and EMA and literature abstracts from the Critical Assessment of Massive Data Analysis (CAMDA).
UNASSIGNED: Moreover, the text mining and XGBoost functions demonstrated in this study can be applied to other text processing and classification tasks.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
3 A multifactor hybrid model for carbon price interval prediction based on decomposition-integration framework.

基于分解 - 集成框架的碳价格区间预测多因子混合模型. 影响指数 : 8.91
发表时间：Jul 2024 7
来源期刊：J Environ Manage PMID：38850918

DOI：10.1016/j.jenvman.2024.121273
文章类型： Journal Article

碳价格是碳交易领域的关键要素。碳价格的准确估算可以为碳市场参与者提供准确的指导。本研究引入了一种新颖的预测模型，该模型包含碳价格的点和区间预测。首先,为了提炼出碳价固有的波动性特征，利用连续变分模态分解将碳价自适应分解为规则序列。其次,为了获得最佳输入变量，利用偏自相关函数和随机森林对影响因素和历史碳价格进行筛选。然后，为了避免单一模型约束，采用麻雀搜索算法优化的分类提升和核极限学习机的组合模型进行点预测，并采用shapley加性解释来阐明模型预测过程。最后,为了提供更有效的信息，将自适应带宽核密度估计应用于区间预测。以湖北碳市场数据为例，结果表明，平均绝对误差，平均绝对百分比误差，模型的均方根误差和R2分别为0.1022、0.0022、0.1262和0.9921。历史碳价格，布伦特原油期货结算价和欧盟配额期货碳价格对碳价格有正向影响,和沪深300对碳价有负面影响。与常数核密度估计相比，该模型实现了更高的区间覆盖概率和更低的区间宽度。因此，混合模式的应用可以促进碳市场的运行效率，促进碳减排政策的实施。
Carbon price is a pivotal element in the carbon trading sector. Accurate estimation of carbon price can offer precise guidance for the carbon market participants. This study introduces a novel prediction model encompassing both point and interval prediction for the carbon price. Firstly, to distill the volatility traits inherent in carbon price, the successive variational mode decomposition is utilized to adaptively decompose the carbon price into regular sequences. Secondly, to obtain the optimal input variables, the partial autocorrelation function and random forest are employed to filter the influencing factors and historical carbon price. Then, to avoid single model constraint, a combination model of categorical boosting and kernel extreme learning machine optimized by the sparrow search algorithm is employed for the point prediction, and the shapley additive explanation is employed to elucidate the model prediction process. Finally, to provide more efficient information, the adaptive bandwidth kernel density estimation is applied to the interval prediction. The data from Hubei carbon market is adopted as a case study, and the results indicate that the mean absolute error, mean absolute percentage error, root mean square error and R2 of the proposed model are 0.1022, 0.0022, 0.1262 and 0.9921, respectively. The historical carbon price, Brent crude oil futures settlement price and European Union allowance futures carbon price have a positive impact on carbon price, and Hushen 300 has a negative impact on carbon price. Compared with the constant kernel density estimation, the proposed model achieves higher interval coverage probability and lower interval width. Thus, the application of the hybrid model can promote the operational efficiency of the carbon market and facilitate the implementation of carbon emission reduction policies.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
4 Incorporating preoperative frailty to assist in early prediction of postoperative pneumonia in elderly patients with hip fractures: an externally validated online interpretable machine learning model.

合并术前虚弱以协助早期预测老年髋部骨折患者术后肺炎：一种外部验证的在线可解释机器学习模型。影响指数 : 4.07
发表时间：May 2024 30
来源期刊：BMC Geriatr PMID：38816811

DOI：10.1186/s12877-024-05050-w
文章类型： Journal Article

背景：本研究旨在对老年髋部骨折患者的术后肺炎（POP）实施有效的预测模型和应用介质，以促进临床医生的个性化干预。
方法：利用老年髋部骨折患者的临床资料，我们推导并外部验证了用于预测POP的机器学习模型。模型推导利用南京市第一医院的注册表，使用南京医科大学第四附属医院患者的数据进行外部验证.推导队列分为训练集和测试集。使用最小绝对收缩和选择算子（LASSO）和多变量逻辑回归进行特征筛选。我们比较了模型的性能以选择优化的模型，并引入了SHapley加法扩张（SHAP）来解释模型。
结果：推导和验证队列包括498名和124名患者，有14.3%和10.5%的流行率，分别。在这些模型中，分类提升(Catboost)表现出优越的辨别能力。训练集和测试集的AUROC分别为0.895（95CI：0.841-0.949）和0.835（95CI：0.740-0.930），分别。在外部验证时，AUROC为0.894（95%CI：0.821-0.966）。SHAP方法显示CRP,修改后的五项脆弱指数(mFI-5)，ASA的身体状态是POP的三大重要预测因素。
结论：我们的模型具有良好的早期预测能力，结合基于Catboost模型的网络风险计算器的实现，预计将有效区分高危人群，促进及时干预。
BACKGROUND: This study aims to implement a validated prediction model and application medium for postoperative pneumonia (POP) in elderly patients with hip fractures in order to facilitate individualized intervention by clinicians.
METHODS: Employing clinical data from elderly patients with hip fractures, we derived and externally validated machine learning models for predicting POP. Model derivation utilized a registry from Nanjing First Hospital, and external validation was performed using data from patients at the Fourth Affiliated Hospital of Nanjing Medical University. The derivation cohort was divided into the training set and the testing set. The least absolute shrinkage and selection operator (LASSO) and multivariable logistic regression were used for feature screening. We compared the performance of models to select the optimized model and introduced SHapley Additive exPlanations (SHAP) to interpret the model.
RESULTS: The derivation and validation cohorts comprised 498 and 124 patients, with 14.3% and 10.5% POP rates, respectively. Among these models, Categorical boosting (Catboost) demonstrated superior discrimination ability. AUROC was 0.895 (95%CI: 0.841-0.949) and 0.835 (95%CI: 0.740-0.930) on the training and testing sets, respectively. At external validation, the AUROC amounted to 0.894 (95% CI: 0.821-0.966). The SHAP method showed that CRP, the modified five-item frailty index (mFI-5), and ASA body status were among the top three important predicators of POP.
CONCLUSIONS: Our model\'s good early prediction ability, combined with the implementation of a network risk calculator based on the Catboost model, was anticipated to effectively distinguish high-risk POP groups, facilitating timely intervention.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
5 Predicting anti-trypanosome effect of carbazole-derived compounds by powerful SVM with novel kernel function and comprehensive learning PSO.

用新型核函数和综合学习 PSO 的强大支持向量机预测咔唑衍生化合物的抗锥虫作用。影响指数 : 5.938
发表时间：May 2024 29
来源期刊：Antimicrob Agents Chemother PMID：38808999

DOI：10.1128/aac.00265-24
文章类型： Journal Article

为了通过定量构效关系预测咔唑衍生化合物的抗锥虫作用,通过线性方法建立了五个模型，随机森林,径向基核函数支持向量机，线性组合混合核函数支持向量机，和非线性组合混合核函数支持向量机(NLMIX-SVM)。启发式方法和优化的CatBoost被用来选择两个不同的关键描述符集，用于建立线性和非线性模型，分别。采用综合学习粒子群算法对所有非线性模型中的超参数进行优化,算法复杂度低,收敛速度快。此外，模型的健壮性和可靠性经过严格的评估，使用五倍和留一法交叉验证，y-随机化,和统计数据，包括一致性相关系数(CCC)，[公式:见正文],[公式:见正文],和[公式：见正文]。在所有的模型中，NLMIX-SVM模型，这是通过支持向量回归使用径向基核函数的非线性组合来建立的，sigmoid核函数，和线性核函数作为一个新的核函数，展示了出色的学习和泛化能力以及鲁棒性：[公式：请参见文本]=0.9581，均方误差（MSE）=0.0199的训练集和[公式：请参见文本]=0.9528，MSE=0.0174的测试集。[公式:见正文],[公式:见正文],CCC,[公式:见正文],[公式:见正文],和[公式：见正文]分别为0.9539、0.8908、0.9752、0.9529、0.9528和0.9633。NLMIX-SVM方法被证明是定量结构-活性关系研究中的一种有前途的方法。此外,分子对接实验分析了新衍生物的性质,并最终发现了一种新的潜在候选药物分子。总之,本研究将为新型抗锥虫药物的设计和筛选提供帮助。
In order to predict the anti-trypanosome effect of carbazole-derived compounds by quantitative structure-activity relationship, five models were established by the linear method, random forest, radial basis kernel function support vector machine, linear combination mix-kernel function support vector machine, and nonlinear combination mix-kernel function support vector machine (NLMIX-SVM). The heuristic method and optimized CatBoost were used to select two different key descriptor sets for building linear and nonlinear models, respectively. Hyperparameters in all nonlinear models were optimized by comprehensive learning particle swarm optimization with low complexity and fast convergence. Furthermore, the models\' robustness and reliability underwent rigorous assessment using fivefold and leave-one-out cross-validation, y-randomization, and statistics including concordance correlation coefficient (CCC), [Formula: see text] , [Formula: see text] , and [Formula: see text] . Among all the models, the NLMIX-SVM model, which was established by support vector regression using a nonlinear combination of radial basis kernel function, sigmoid kernel function, and linear kernel function as a new kernel function, demonstrated excellent learning and generalization abilities as well as robustness: [Formula: see text] = 0.9581, mean square error (MSE) = 0.0199 for the training set and [Formula: see text] = 0.9528, MSE = 0.0174 for the test set. [Formula: see text] , [Formula: see text] , CCC, [Formula: see text] , [Formula: see text], and [Formula: see text] are 0.9539, 0.8908, 0.9752, 0.9529, 0.9528, and 0.9633, respectively. The NLMIX-SVM method proved to be a promising way in quantitative structure-activity relationship research. In addition, molecular docking experiments were conducted to analyze the properties of new derivatives, and a new potential candidate drug molecule was ultimately found. In summary, this study will provide help for the design and screening of novel anti-trypanosome drugs.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
6 Machine learning model for predicting stroke recurrence in adult stroke patients with moyamoya disease and factors of stroke recurrence.

机器学习模型预测烟雾病成年卒中患者卒中复发及卒中复发因素. 影响指数 : 1.885
发表时间：07 2024 29
来源期刊：Clin Neurol Neurosurg PMID：38733759

DOI：10.1016/j.clineuro.2024.108308
文章类型： Journal Article

目的：这项研究的目的是建立一个有效的机器学习模型，以帮助预测患有烟雾病（MMD）的成年卒中患者的卒中复发。同时分析中风复发的因素。
方法：本回顾性研究数据来源于江西省医疗大数据工程技术研究中心数据库。此外，南昌大学第二附属医院1月1日起收治的MMD患者信息,2007年12月31日,2019年被收购。1月1日共有661名患者,2007年2月28日,2017年被涵盖在培训集中，而外部验证集由284名患者组成，这些患者从3月1日起进入范围，2017年12月31日,2019.首先,在训练集和外部验证集之间比较了所有受试者的信息.使用Lasso回归算法筛选出关键影响变量。此外，基于五种不同的机器学习算法，建立了预测卒中后1年、2年和3年卒中复发的模型，所有模型都经过外部验证，然后进行比较。最后,使用Shapley加法扩张（SHAP）解释模型解释了具有最佳性能的CatBoost模型。
结果：一般来说，招募了945名患有MMD的患者，首次卒中后1年、2年和3年的急性卒中复发率达到11.43％（108/945），18.94%（179/945），和23.17%（219/945），分别。CatBoost模型在所有模型中表现出最佳的预测性能；这些模型预测1年、2年和3年中风复发的曲线下面积（AUC）被确定为0.794（0.787，0.801），0.813(0.807,0.818),和0.789（0.783，0.795），分别。如SHAP解释模型的结果表明，铃木的舞台，年轻人（18-44岁），没有手术治疗,在接受MMD治疗的成年卒中患者中,动脉瘤的存在可能与卒中复发显著相关.
结论：在患有MMD的成年中风患者中，CatBoost模型被证实在中风复发预测中有效，产生准确可靠的预测结果。高铃木舞台，年轻人（18-44岁），没有手术治疗,在接受MMD治疗的成年卒中患者中,动脉瘤的存在可能与卒中复发显著相关.
The aim of this study was at building an effective machine learning model to contribute to the prediction of stroke recurrence in adult stroke patients subjected to moyamoya disease (MMD), while at analyzing the factors for stroke recurrence.
The data of this retrospective study originated from the database of JiangXi Province Medical Big Data Engineering & Technology Research Center. Moreover, the information of MMD patients admitted to the second affiliated hospital of Nanchang university from January 1st, 2007 to December 31st, 2019 was acquired. A total of 661 patients from January 1st, 2007 to February 28th, 2017 were covered in the training set, while the external validation set comprised 284 patients that fell into a scope from March 1st, 2017 to December 31st, 2019. First, the information regarding all the subjects was compared between the training set and the external validation set. The key influencing variables were screened out using the Lasso Regression Algorithm. Furthermore, the models for predicting stroke recurrence in 1, 2, and 3 years after the initial stroke were built based on five different machine learning algorithms, and all models were externally validated and then compared. Lastly, the CatBoost model with the optimal performance was explained using the SHapley Additive exPlanations (SHAP) interpretation model.
In general, 945 patients suffering from MMD were recruited, and the recurrence rate of acute stroke in 1, 2, and 3 years after the initial stroke reached 11.43%(108/945), 18.94%(179/945), and 23.17%(219/945), respectively. The CatBoost models exhibited the optimal prediction performance among all models; the area under the curve (AUC) of these models for predicting stroke recurrence in 1, 2, and 3 years was determined as 0.794 (0.787, 0.801), 0.813 (0.807, 0.818), and 0.789 (0.783, 0.795), respectively. As indicated by the results of the SHAP interpretation model, the high Suzuki stage, young adults (aged 18-44), no surgical treatment, and the presence of an aneurysm were likely to show significant correlations with the recurrence of stroke in adult stroke patients subjected to MMD.
In adult stroke patients suffering from MMD, the CatBoost model was confirmed to be effective in stroke recurrence prediction, yielding accurate and reliable prediction outcomes. High Suzuki stage, young adults (aged 18-44 years), no surgical treatment, and the presence of an aneurysm are likely to be significantly correlated with the recurrence of stroke in adult stroke patients subjected to MMD.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
7 Prediction and analysis of risk factors for diabetic retinopathy based on machine learning and interpretable models.

基于机器学习和可解释模型的糖尿病视网膜病变危险因素预测与分析 [J]. 影响指数 : 3.776
发表时间：May 2024 15
来源期刊：Heliyon PMID：38699007

DOI：10.1016/j.heliyon.2024.e29497
文章类型： Journal Article

■糖尿病视网膜病变是糖尿病的主要并发症之一。在这项研究中,为了提高糖尿病视网膜病变风险预测的准确性,建立了融合机器学习模型和SHAP的糖尿病视网膜病变风险预测模型,解释模型预测结果的合理性,提高预测结果的可靠性。
■对缺失值和异常值的数据进行了预处理，通过信息增益选择的特征，使用CatBoost建立的糖尿病视网膜病变风险预测模型和使用SHAP模型解释的模式的输出。
■本研究使用了来自国家临床医学科学数据中心的糖尿病并发症预警数据集的一千个糖尿病并发症预警数据。基于CatBoost的糖尿病视网膜病变预测模型在对比模型试验中表现最好。ALB_CR,HbA1c,UPR_24、肾病和SCR与糖尿病视网膜病变呈正相关,而CP,HB,ALB,DBILI和CRP与糖尿病视网膜病变呈负相关。HEIGHT之间的关系,WIGHT和ESR特点与糖尿病视网膜病变无显著关系。
■糖尿病视网膜病变的危险因素包括肾功能差，血糖水平升高，肝病,血液病和动脉收缩异常,在其他人中。通过监测和有效控制相关指标可预防糖尿病视网膜病变。在这项研究中,分析各特征间的影响关系,进一步探讨糖尿病视网膜病变的潜在因素,可为后续糖尿病视网膜病变的早期预防和临床诊断提供新方法和新思路。
UNASSIGNED: Diabetic retinopathy is one of the major complications of diabetes. In this study, a diabetic retinopathy risk prediction model integrating machine learning models and SHAP was established to increase the accuracy of risk prediction for diabetic retinopathy, explain the rationality of the findings from model prediction and improve the reliability of prediction results.
UNASSIGNED: Data were preprocessed for missing values and outliers, features selected through information gain, a diabetic retinopathy risk prediction model established using the CatBoost and the outputs of the mode interpreted using the SHAP model.
UNASSIGNED: One thousand early warning data of diabetes complications derived from diabetes complication early warning dataset from the National Clinical Medical Sciences Data Center were used in this study. The CatBoost-based model for diabetic retinopathy prediction performed the best in the comparative model test. ALB_CR, HbA1c, UPR_24, NEPHROPATHY and SCR were positively correlated with diabetic retinopathy, while CP, HB, ALB, DBILI and CRP were negatively correlated with diabetic retinopathy. The relationships between HEIGHT, WEIGHT and ESR characteristics and diabetic retinopathy were not significant.
UNASSIGNED: The risk factors for diabetic retinopathy include poor renal function, elevated blood glucose level, liver disease, hematonosis and dysarteriotony, among others. Diabetic retinopathy can be prevented by monitoring and effectively controlling relevant indices. In this study, the influence relationships between the features were also analyzed to further explore the potential factors of diabetic retinopathy, which can provide new methods and new ideas for the early prevention and clinical diagnosis of subsequent diabetic retinopathy.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
8 Effects of non-landslide sampling strategies on machine learning models in landslide susceptibility mapping.

非滑坡采样策略对滑坡敏感性制图中机器学习模型的影响。影响指数 : 4.996
发表时间：Mar 2024 26
来源期刊：Sci Rep PMID：38532140

DOI：10.1038/s41598-024-57964-5
文章类型： Journal Article

本研究旨在探讨不同的非滑坡采样策略对滑坡敏感性制图中机器学习模型的影响。非滑坡样本本质上是不确定的，并且非滑坡样本的选择可能会遇到诸如嘈杂或区域代表性不足等问题，这可能会影响结果的准确性。在这项研究中,针对非滑坡样本选择，引入了一种积极的无标记（PU）套袋半监督学习方法。此外,采用缓冲液对照抽样(BCS)和K-均值(KM)聚类进行比较分析。根据巧家县的滑坡资料，云南省,中国,2014年收集的三种机器学习模型，即,随机森林,支持向量机，和CatBoost,用于滑坡敏感性制图。结果表明，采用不同的非滑坡抽样策略选取的样本质量差异显著。总的来说,使用PU套袋方法选择的非滑坡样品质量较好，该方法与CatBoost结合用于预测（AUC=0.897）在极高和高敏感性区域（82.14％）的滑坡时表现最佳。此外,KM结果表明过拟合，显示验证的准确性高，但分区的统计结果较差。BCS结果最差。
This study aims to explore the effects of different non-landslide sampling strategies on machine learning models in landslide susceptibility mapping. Non-landslide samples are inherently uncertain, and the selection of non-landslide samples may suffer from issues such as noisy or insufficient regional representations, which can affect the accuracy of the results. In this study, a positive-unlabeled (PU) bagging semi-supervised learning method was introduced for non-landslide sample selection. In addition, buffer control sampling (BCS) and K-means (KM) clustering were applied for comparative analysis. Based on landslide data from Qiaojia County, Yunnan Province, China, collected in 2014, three machine learning models, namely, random forest, support vector machine, and CatBoost, were used for landslide susceptibility mapping. The results show that the quality of samples selected using different non-landslide sampling strategies varies significantly. Overall, the quality of non-landslide samples selected using the PU bagging method is superior, and this method performs best when combined with CatBoost for predicting (AUC = 0.897) landslides in very high and high susceptibility zones (82.14%). Additionally, the KM results indicated overfitting, displaying high accuracy for validation but poor statistical outcomes for zoning. The BCS results were the worst.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
9 Estimating the Heavy Metal Contents in Entisols from a Mining Area Based on Improved Spectral Indices and Catboost.

基于改进的光谱指数和 Catboost 估算矿区 Entisols 中的重金属含量。影响指数 : 3.847
发表时间：Feb 2024 25
来源期刊：Sensors (Basel) PMID：38475028

DOI：10.3390/s24051492
文章类型： Journal Article

在利用高光谱技术反演土壤多物种重金属元素浓度的研究中,特征波段的选择非常重要。然而,土壤元素之间的相互作用会导致光谱特征的冗余和不稳定性。在这项研究中,重金属元素(Pb，Zn,Mn,和As)在哈尔滨矿区周围的整体中，黑龙江省,中国,被研究过。为了优化光谱指数及其权重的组合，特征波段皮尔逊系数(RCBP)的雷达图用于筛选Pb的三波段光谱指数组合，Zn,Mn,作为元素，而Catboost算法用于反演每种元素的浓度。从浓度和特征带两个角度分析了铁与四种重金属的相关性，同时通过空间分析进一步评估了光谱反演的效果。发现基于优化的光谱指数组合反演Zn元素浓度的回归模型具有最佳拟合,对于测试集，R2=0.8786，其次是Mn(R2=0.8576)，As（R2=0.7916），和Pb(R2=0.6022)。就特征波段而言，铁与铅的最佳相关性，Zn,Mn和As元素分别为0.837、0.711、0.542和0.303。As和Mn元素的光谱反演浓度与实测浓度的空间分布和相关性是一致的,Zn和Pb的测定结果存在一定差异。因此,高光谱技术和Fe元素的分析在重金属浓度的反演中具有潜在的应用，可以提高这些土壤的质量监测效率。
In the study of the inversion of soil multi-species heavy metal element concentrations using hyperspectral techniques, the selection of feature bands is very important. However, interactions among soil elements can lead to redundancy and instability of spectral features. In this study, heavy metal elements (Pb, Zn, Mn, and As) in entisols around a mining area in Harbin, Heilongjiang Province, China, were studied. To optimise the combination of spectral indices and their weights, radar plots of characteristic-band Pearson coefficients (RCBP) were used to screen three-band spectral index combinations of Pb, Zn, Mn, and As elements, while the Catboost algorithm was used to invert the concentrations of each element. The correlations of Fe with the four heavy metals were analysed from both concentration and characteristic band perspectives, while the effect of spectral inversion was further evaluated via spatial analysis. It was found that the regression model for the inversion of the Zn elemental concentration based on the optimised spectral index combinations had the best fit, with R2 = 0.8786 for the test set, followed by Mn (R2 = 0.8576), As (R2 = 0.7916), and Pb (R2 = 0.6022). As far as the characteristic bands are concerned, the best correlations of Fe with the Pb, Zn, Mn and As elements were 0.837, 0.711, 0.542 and 0.303, respectively. The spatial distribution and correlation of the spectral inversion concentrations of the As and Mn elements with the measured concentrations were consistent, and there were some differences in the results for Zn and Pb. Therefore, hyperspectral techniques and analysis of Fe elements have potential applications in the inversion of entisols heavy metal concentrations and can improve the quality monitoring efficiency of these soils.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
10 Diurnal Pain Classification in Critically Ill Patients using Machine Learning on Accelerometry and Analgesic Data.

使用加速和镇痛数据的机器学习对重症患者的昼夜疼痛分类。影响指数 : 暂无
发表时间：Dec 2023
来源期刊：IEEE Int Conf Bioinform Biomed Workshops PMID：38463539

DOI：10.1109/bibm58861.2023.10385764
文章类型： Journal Article

由于该患者人群中沟通障碍的患病率增加，因此量化重症监护病房（ICU）患者的疼痛具有挑战性。先前的研究认为，危重患者的疼痛与身体活动之间存在正相关。在这项研究中,我们通过构建机器学习分类器来检验从每日可穿戴设备收集的加速度计数据预测ICU患者自我报告的疼痛水平的能力,从而推进了这一假设.我们训练了多个机器学习(ML)模型，包括Logistic回归，CatBoost,和XG-Boost，从加速度计数据中提取的统计特征，结合以前的疼痛测量和患者人口统计学。根据先前的研究表明，夜间ICU患者的疼痛敏感性发生变化，我们对日间和夜间疼痛报告分别进行了疼痛分类.在疼痛与无痛分类设置中，逻辑回归给出了白天的最佳分类器（AUC：0.72，F1评分：0.72），和CatBoost在夜间给出最好的分类器(AUC:0.82,F1得分:0.82)。逻辑回归的性能下降到0.61AUC，0.62F1评分（轻度vs.中度疼痛,夜间)，和CatBoost的性能同样受到0.61AUC的影响，0.60F1分数（中等与中等剧烈疼痛,白天)。包含镇痛信息有利于中度和重度疼痛之间的分类。进行SHAP分析以找到每种设置中最重要的特征。它在所有评估的设置中对加速度计相关功能赋予了最高的重要性，但也显示了其他功能的贡献，如年龄和药物在特定环境中的贡献。总之,加速度计数据与患者人口统计学和先前的疼痛测量值相结合,可用于从ICU中的无痛发作中筛查疼痛,并可与镇痛信息相结合,以在不同严重程度的疼痛发作之间提供中等程度的分类.
Quantifying pain in patients admitted to intensive care units (ICUs) is challenging due to the increased prevalence of communication barriers in this patient population. Previous research has posited a positive correlation between pain and physical activity in critically ill patients. In this study, we advance this hypothesis by building machine learning classifiers to examine the ability of accelerometer data collected from daily wearables to predict self-reported pain levels experienced by patients in the ICU. We trained multiple Machine Learning (ML) models, including Logistic Regression, CatBoost, and XG-Boost, on statistical features extracted from the accelerometer data combined with previous pain measurements and patient demographics. Following previous studies that showed a change in pain sensitivity in ICU patients at night, we performed the task of pain classification separately for daytime and nighttime pain reports. In the pain versus no-pain classification setting, logistic regression gave the best classifier in daytime (AUC: 0.72, F1-score: 0.72), and CatBoost gave the best classifier at nighttime (AUC: 0.82, F1-score: 0.82). Performance of logistic regression dropped to 0.61 AUC, 0.62 F1-score (mild vs. moderate pain, nighttime), and CatBoost\'s performance was similarly affected with 0.61 AUC, 0.60 F1-score (moderate vs. severe pain, daytime). The inclusion of analgesic information benefited the classification between moderate and severe pain. SHAP analysis was conducted to find the most significant features in each setting. It assigned the highest importance to accelerometer-related features on all evaluated settings but also showed the contribution of the other features such as age and medications in specific contexts. In conclusion, accelerometer data combined with patient demographics and previous pain measurements can be used to screen painful from painless episodes in the ICU and can be combined with analgesic information to provide moderate classification between painful episodes of different severities.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)

CatBoost 关注

1 Prediction of Protein-DNA Interface Hot Spots Based on Empirical Mode Decomposition and Machine Learning.

2 Automatic text classification of drug-induced liver injury using document-term matrix and XGBoost.

3 A multifactor hybrid model for carbon price interval prediction based on decomposition-integration framework.

4 Incorporating preoperative frailty to assist in early prediction of postoperative pneumonia in elderly patients with hip fractures: an externally validated online interpretable machine learning model.

5 Predicting anti-trypanosome effect of carbazole-derived compounds by powerful SVM with novel kernel function and comprehensive learning PSO.

6 Machine learning model for predicting stroke recurrence in adult stroke patients with moyamoya disease and factors of stroke recurrence.

7 Prediction and analysis of risk factors for diabetic retinopathy based on machine learning and interpretable models.

8 Effects of non-landslide sampling strategies on machine learning models in landslide susceptibility mapping.

9 Estimating the Heavy Metal Contents in Entisols from a Mining Area Based on Improved Spectral Indices and Catboost.

10 Diurnal Pain Classification in Critically Ill Patients using Machine Learning on Accelerometry and Analgesic Data.