CatBoost

CatBoost
  • 文章类型: Journal Article
    蛋白质-DNA复合物相互作用在基因表达等生物活性中起着至关重要的作用,修改,复制和转录。了解蛋白质-DNA结合界面热点的生理意义,以及计算生物学的发展,取决于这些区域的精确识别。在本文中,提出了一种称为EC-PDH的热点预测方法。首先,我们提取了这些热点的特征\'固体溶剂可及表面积(ASA)和二级结构,然后是意思,方差,通过经验模态分解算法(EMD)提取这些传统特征的前三个固有模态分量(IMFs)的能量和自相关函数值作为新特征。总共获得218个维度特征。对于特征选择,我们使用最大相关最小冗余序列正向选择方法(mRMR-SFS)来获得最佳的11维特征子集。为了解决数据不平衡的问题,我们使用SMOTE-Tomek算法来平衡正负样本,最后使用cat梯度增强(CatBoost)构建蛋白质-DNA结合界面的热点预测模型.我们的方法在测试集上表现良好,AUC,MCC和F1得分值分别为0.847、0.543和0.772。经过比较评估,EC-PDH在识别热点方面优于现有的最先进的方法。
    Protein-DNA complex interactivity plays a crucial role in biological activities such as gene expression, modification, replication and transcription. Understanding the physiological significance of protein-DNA binding interfacial hot spots, as well as the development of computational biology, depends on the precise identification of these regions. In this paper, a hot spot prediction method called EC-PDH is proposed. First, we extracted features of these hot spots\' solid solvent-accessible surface area (ASA) and secondary structure, and then the mean, variance, energy and autocorrelation function values of the first three intrinsic modal components (IMFs) of these conventional features were extracted as new features via the empirical modal decomposition algorithm (EMD). A total of 218 dimensional features were obtained. For feature selection, we used the maximum correlation minimum redundancy sequence forward selection method (mRMR-SFS) to obtain an optimal 11-dimensional-feature subset. To address the issue of data imbalance, we used the SMOTE-Tomek algorithm to balance positive and negative samples and finally used cat gradient boosting (CatBoost) to construct our hot spot prediction model for protein-DNA binding interfaces. Our method performs well on the test set, with AUC, MCC and F1 score values of 0.847, 0.543 and 0.772, respectively. After a comparative evaluation, EC-PDH outperforms the existing state-of-the-art methods in identifying hot spots.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    监管机构在审查过程中会产生大量的文本数据。例如,药品标签是监管机构的宝贵资源,如美国食品和药物管理局(FDA)和欧洲医学署(EMA),向医疗保健专业人员和患者传达药物安全性和有效性信息。药物标签也是药物警戒和药物安全性研究的资源。自动文本分类将大大改善药品标签文档的分析并节省审阅者资源。
    我们在这项研究中利用人工智能对基于FDA的DILIrank数据集的药物标签文件中的药物诱导肝损伤(DILI)相关内容进行分类。我们采用了文本挖掘和XGBoost模型,并利用不良事件标准的首选医学查询术语来简化常见单词和短语的消除,同时保留FDA和EMA药物标签数据集的医学标准术语。然后,我们使用通过术语频率-逆文档频率(TF-IDF)为每个包含的单词/术语/标记计算的权重来构建文档术语矩阵。
    自动文本分类模型在预测DILI方面表现出强大的性能,FDA和EMA的药物标签以及海量数据分析关键评估(CAMDA)的文献摘要的交叉验证AUC得分均超过0.90。
    此外,本研究中演示的文本挖掘和XGBoost函数可以应用于其他文本处理和分类任务。
    UNASSIGNED: Regulatory agencies generate a vast amount of textual data in the review process. For example, drug labeling serves as a valuable resource for regulatory agencies, such as U.S. Food and Drug Administration (FDA) and Europe Medical Agency (EMA), to communicate drug safety and effectiveness information to healthcare professionals and patients. Drug labeling also serves as a resource for pharmacovigilance and drug safety research. Automated text classification would significantly improve the analysis of drug labeling documents and conserve reviewer resources.
    UNASSIGNED: We utilized artificial intelligence in this study to classify drug-induced liver injury (DILI)-related content from drug labeling documents based on FDA\'s DILIrank dataset. We employed text mining and XGBoost models and utilized the Preferred Terms of Medical queries for adverse event standards to simplify the elimination of common words and phrases while retaining medical standard terms for FDA and EMA drug label datasets. Then, we constructed a document term matrix using weights computed by Term Frequency-Inverse Document Frequency (TF-IDF) for each included word/term/token.
    UNASSIGNED: The automatic text classification model exhibited robust performance in predicting DILI, achieving cross-validation AUC scores exceeding 0.90 for both drug labels from FDA and EMA and literature abstracts from the Critical Assessment of Massive Data Analysis (CAMDA).
    UNASSIGNED: Moreover, the text mining and XGBoost functions demonstrated in this study can be applied to other text processing and classification tasks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:本研究旨在对老年髋部骨折患者的术后肺炎(POP)实施有效的预测模型和应用介质,以促进临床医生的个性化干预。
    方法:利用老年髋部骨折患者的临床资料,我们推导并外部验证了用于预测POP的机器学习模型。模型推导利用南京市第一医院的注册表,使用南京医科大学第四附属医院患者的数据进行外部验证.推导队列分为训练集和测试集。使用最小绝对收缩和选择算子(LASSO)和多变量逻辑回归进行特征筛选。我们比较了模型的性能以选择优化的模型,并引入了SHapley加法扩张(SHAP)来解释模型。
    结果:推导和验证队列包括498名和124名患者,有14.3%和10.5%的流行率,分别。在这些模型中,分类提升(Catboost)表现出优越的辨别能力。训练集和测试集的AUROC分别为0.895(95CI:0.841-0.949)和0.835(95CI:0.740-0.930),分别。在外部验证时,AUROC为0.894(95%CI:0.821-0.966)。SHAP方法显示CRP,修改后的五项脆弱指数(mFI-5),ASA的身体状态是POP的三大重要预测因素。
    结论:我们的模型具有良好的早期预测能力,结合基于Catboost模型的网络风险计算器的实现,预计将有效区分高危人群,促进及时干预。
    BACKGROUND: This study aims to implement a validated prediction model and application medium for postoperative pneumonia (POP) in elderly patients with hip fractures in order to facilitate individualized intervention by clinicians.
    METHODS: Employing clinical data from elderly patients with hip fractures, we derived and externally validated machine learning models for predicting POP. Model derivation utilized a registry from Nanjing First Hospital, and external validation was performed using data from patients at the Fourth Affiliated Hospital of Nanjing Medical University. The derivation cohort was divided into the training set and the testing set. The least absolute shrinkage and selection operator (LASSO) and multivariable logistic regression were used for feature screening. We compared the performance of models to select the optimized model and introduced SHapley Additive exPlanations (SHAP) to interpret the model.
    RESULTS: The derivation and validation cohorts comprised 498 and 124 patients, with 14.3% and 10.5% POP rates, respectively. Among these models, Categorical boosting (Catboost) demonstrated superior discrimination ability. AUROC was 0.895 (95%CI: 0.841-0.949) and 0.835 (95%CI: 0.740-0.930) on the training and testing sets, respectively. At external validation, the AUROC amounted to 0.894 (95% CI: 0.821-0.966). The SHAP method showed that CRP, the modified five-item frailty index (mFI-5), and ASA body status were among the top three important predicators of POP.
    CONCLUSIONS: Our model\'s good early prediction ability, combined with the implementation of a network risk calculator based on the Catboost model, was anticipated to effectively distinguish high-risk POP groups, facilitating timely intervention.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    为了通过定量构效关系预测咔唑衍生化合物的抗锥虫作用,通过线性方法建立了五个模型,随机森林,径向基核函数支持向量机,线性组合混合核函数支持向量机,和非线性组合混合核函数支持向量机(NLMIX-SVM)。启发式方法和优化的CatBoost被用来选择两个不同的关键描述符集,用于建立线性和非线性模型,分别。采用综合学习粒子群算法对所有非线性模型中的超参数进行优化,算法复杂度低,收敛速度快。此外,模型的健壮性和可靠性经过严格的评估,使用五倍和留一法交叉验证,y-随机化,和统计数据,包括一致性相关系数(CCC),[公式:见正文],[公式:见正文],和[公式:见正文]。在所有的模型中,NLMIX-SVM模型,这是通过支持向量回归使用径向基核函数的非线性组合来建立的,sigmoid核函数,和线性核函数作为一个新的核函数,展示了出色的学习和泛化能力以及鲁棒性:[公式:请参见文本]=0.9581,均方误差(MSE)=0.0199的训练集和[公式:请参见文本]=0.9528,MSE=0.0174的测试集。[公式:见正文],[公式:见正文],CCC,[公式:见正文],[公式:见正文],和[公式:见正文]分别为0.9539、0.8908、0.9752、0.9529、0.9528和0.9633。NLMIX-SVM方法被证明是定量结构-活性关系研究中的一种有前途的方法。此外,分子对接实验分析了新衍生物的性质,并最终发现了一种新的潜在候选药物分子。总之,本研究将为新型抗锥虫药物的设计和筛选提供帮助。
    In order to predict the anti-trypanosome effect of carbazole-derived compounds by quantitative structure-activity relationship, five models were established by the linear method, random forest, radial basis kernel function support vector machine, linear combination mix-kernel function support vector machine, and nonlinear combination mix-kernel function support vector machine (NLMIX-SVM). The heuristic method and optimized CatBoost were used to select two different key descriptor sets for building linear and nonlinear models, respectively. Hyperparameters in all nonlinear models were optimized by comprehensive learning particle swarm optimization with low complexity and fast convergence. Furthermore, the models\' robustness and reliability underwent rigorous assessment using fivefold and leave-one-out cross-validation, y-randomization, and statistics including concordance correlation coefficient (CCC), [Formula: see text] , [Formula: see text] , and [Formula: see text] . Among all the models, the NLMIX-SVM model, which was established by support vector regression using a nonlinear combination of radial basis kernel function, sigmoid kernel function, and linear kernel function as a new kernel function, demonstrated excellent learning and generalization abilities as well as robustness: [Formula: see text] = 0.9581, mean square error (MSE) = 0.0199 for the training set and [Formula: see text] = 0.9528, MSE = 0.0174 for the test set. [Formula: see text] , [Formula: see text] , CCC, [Formula: see text] , [Formula: see text], and [Formula: see text] are 0.9539, 0.8908, 0.9752, 0.9529, 0.9528, and 0.9633, respectively. The NLMIX-SVM method proved to be a promising way in quantitative structure-activity relationship research. In addition, molecular docking experiments were conducted to analyze the properties of new derivatives, and a new potential candidate drug molecule was ultimately found. In summary, this study will provide help for the design and screening of novel anti-trypanosome drugs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    糖尿病视网膜病变是糖尿病的主要并发症之一。在这项研究中,为了提高糖尿病视网膜病变风险预测的准确性,建立了融合机器学习模型和SHAP的糖尿病视网膜病变风险预测模型,解释模型预测结果的合理性,提高预测结果的可靠性。
    对缺失值和异常值的数据进行了预处理,通过信息增益选择的特征,使用CatBoost建立的糖尿病视网膜病变风险预测模型和使用SHAP模型解释的模式的输出。
    本研究使用了来自国家临床医学科学数据中心的糖尿病并发症预警数据集的一千个糖尿病并发症预警数据。基于CatBoost的糖尿病视网膜病变预测模型在对比模型试验中表现最好。ALB_CR,HbA1c,UPR_24、肾病和SCR与糖尿病视网膜病变呈正相关,而CP,HB,ALB,DBILI和CRP与糖尿病视网膜病变呈负相关。HEIGHT之间的关系,WIGHT和ESR特点与糖尿病视网膜病变无显著关系。
    糖尿病视网膜病变的危险因素包括肾功能差,血糖水平升高,肝病,血液病和动脉收缩异常,在其他人中。通过监测和有效控制相关指标可预防糖尿病视网膜病变。在这项研究中,分析各特征间的影响关系,进一步探讨糖尿病视网膜病变的潜在因素,可为后续糖尿病视网膜病变的早期预防和临床诊断提供新方法和新思路。
    UNASSIGNED: Diabetic retinopathy is one of the major complications of diabetes. In this study, a diabetic retinopathy risk prediction model integrating machine learning models and SHAP was established to increase the accuracy of risk prediction for diabetic retinopathy, explain the rationality of the findings from model prediction and improve the reliability of prediction results.
    UNASSIGNED: Data were preprocessed for missing values and outliers, features selected through information gain, a diabetic retinopathy risk prediction model established using the CatBoost and the outputs of the mode interpreted using the SHAP model.
    UNASSIGNED: One thousand early warning data of diabetes complications derived from diabetes complication early warning dataset from the National Clinical Medical Sciences Data Center were used in this study. The CatBoost-based model for diabetic retinopathy prediction performed the best in the comparative model test. ALB_CR, HbA1c, UPR_24, NEPHROPATHY and SCR were positively correlated with diabetic retinopathy, while CP, HB, ALB, DBILI and CRP were negatively correlated with diabetic retinopathy. The relationships between HEIGHT, WEIGHT and ESR characteristics and diabetic retinopathy were not significant.
    UNASSIGNED: The risk factors for diabetic retinopathy include poor renal function, elevated blood glucose level, liver disease, hematonosis and dysarteriotony, among others. Diabetic retinopathy can be prevented by monitoring and effectively controlling relevant indices. In this study, the influence relationships between the features were also analyzed to further explore the potential factors of diabetic retinopathy, which can provide new methods and new ideas for the early prevention and clinical diagnosis of subsequent diabetic retinopathy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究旨在探讨不同的非滑坡采样策略对滑坡敏感性制图中机器学习模型的影响。非滑坡样本本质上是不确定的,并且非滑坡样本的选择可能会遇到诸如嘈杂或区域代表性不足等问题,这可能会影响结果的准确性。在这项研究中,针对非滑坡样本选择,引入了一种积极的无标记(PU)套袋半监督学习方法。此外,采用缓冲液对照抽样(BCS)和K-均值(KM)聚类进行比较分析。根据巧家县的滑坡资料,云南省,中国,2014年收集的三种机器学习模型,即,随机森林,支持向量机,和CatBoost,用于滑坡敏感性制图。结果表明,采用不同的非滑坡抽样策略选取的样本质量差异显著。总的来说,使用PU套袋方法选择的非滑坡样品质量较好,该方法与CatBoost结合用于预测(AUC=0.897)在极高和高敏感性区域(82.14%)的滑坡时表现最佳。此外,KM结果表明过拟合,显示验证的准确性高,但分区的统计结果较差。BCS结果最差。
    This study aims to explore the effects of different non-landslide sampling strategies on machine learning models in landslide susceptibility mapping. Non-landslide samples are inherently uncertain, and the selection of non-landslide samples may suffer from issues such as noisy or insufficient regional representations, which can affect the accuracy of the results. In this study, a positive-unlabeled (PU) bagging semi-supervised learning method was introduced for non-landslide sample selection. In addition, buffer control sampling (BCS) and K-means (KM) clustering were applied for comparative analysis. Based on landslide data from Qiaojia County, Yunnan Province, China, collected in 2014, three machine learning models, namely, random forest, support vector machine, and CatBoost, were used for landslide susceptibility mapping. The results show that the quality of samples selected using different non-landslide sampling strategies varies significantly. Overall, the quality of non-landslide samples selected using the PU bagging method is superior, and this method performs best when combined with CatBoost for predicting (AUC = 0.897) landslides in very high and high susceptibility zones (82.14%). Additionally, the KM results indicated overfitting, displaying high accuracy for validation but poor statistical outcomes for zoning. The BCS results were the worst.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在利用高光谱技术反演土壤多物种重金属元素浓度的研究中,特征波段的选择非常重要。然而,土壤元素之间的相互作用会导致光谱特征的冗余和不稳定性。在这项研究中,重金属元素(Pb,Zn,Mn,和As)在哈尔滨矿区周围的整体中,黑龙江省,中国,被研究过。为了优化光谱指数及其权重的组合,特征波段皮尔逊系数(RCBP)的雷达图用于筛选Pb的三波段光谱指数组合,Zn,Mn,作为元素,而Catboost算法用于反演每种元素的浓度。从浓度和特征带两个角度分析了铁与四种重金属的相关性,同时通过空间分析进一步评估了光谱反演的效果。发现基于优化的光谱指数组合反演Zn元素浓度的回归模型具有最佳拟合,对于测试集,R2=0.8786,其次是Mn(R2=0.8576),As(R2=0.7916),和Pb(R2=0.6022)。就特征波段而言,铁与铅的最佳相关性,Zn,Mn和As元素分别为0.837、0.711、0.542和0.303。As和Mn元素的光谱反演浓度与实测浓度的空间分布和相关性是一致的,Zn和Pb的测定结果存在一定差异。因此,高光谱技术和Fe元素的分析在重金属浓度的反演中具有潜在的应用,可以提高这些土壤的质量监测效率。
    In the study of the inversion of soil multi-species heavy metal element concentrations using hyperspectral techniques, the selection of feature bands is very important. However, interactions among soil elements can lead to redundancy and instability of spectral features. In this study, heavy metal elements (Pb, Zn, Mn, and As) in entisols around a mining area in Harbin, Heilongjiang Province, China, were studied. To optimise the combination of spectral indices and their weights, radar plots of characteristic-band Pearson coefficients (RCBP) were used to screen three-band spectral index combinations of Pb, Zn, Mn, and As elements, while the Catboost algorithm was used to invert the concentrations of each element. The correlations of Fe with the four heavy metals were analysed from both concentration and characteristic band perspectives, while the effect of spectral inversion was further evaluated via spatial analysis. It was found that the regression model for the inversion of the Zn elemental concentration based on the optimised spectral index combinations had the best fit, with R2 = 0.8786 for the test set, followed by Mn (R2 = 0.8576), As (R2 = 0.7916), and Pb (R2 = 0.6022). As far as the characteristic bands are concerned, the best correlations of Fe with the Pb, Zn, Mn and As elements were 0.837, 0.711, 0.542 and 0.303, respectively. The spatial distribution and correlation of the spectral inversion concentrations of the As and Mn elements with the measured concentrations were consistent, and there were some differences in the results for Zn and Pb. Therefore, hyperspectral techniques and analysis of Fe elements have potential applications in the inversion of entisols heavy metal concentrations and can improve the quality monitoring efficiency of these soils.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    由于该患者人群中沟通障碍的患病率增加,因此量化重症监护病房(ICU)患者的疼痛具有挑战性。先前的研究认为,危重患者的疼痛与身体活动之间存在正相关。在这项研究中,我们通过构建机器学习分类器来检验从每日可穿戴设备收集的加速度计数据预测ICU患者自我报告的疼痛水平的能力,从而推进了这一假设.我们训练了多个机器学习(ML)模型,包括Logistic回归,CatBoost,和XG-Boost,从加速度计数据中提取的统计特征,结合以前的疼痛测量和患者人口统计学。根据先前的研究表明,夜间ICU患者的疼痛敏感性发生变化,我们对日间和夜间疼痛报告分别进行了疼痛分类.在疼痛与无痛分类设置中,逻辑回归给出了白天的最佳分类器(AUC:0.72,F1评分:0.72),和CatBoost在夜间给出最好的分类器(AUC:0.82,F1得分:0.82)。逻辑回归的性能下降到0.61AUC,0.62F1评分(轻度vs.中度疼痛,夜间),和CatBoost的性能同样受到0.61AUC的影响,0.60F1分数(中等与中等剧烈疼痛,白天)。包含镇痛信息有利于中度和重度疼痛之间的分类。进行SHAP分析以找到每种设置中最重要的特征。它在所有评估的设置中对加速度计相关功能赋予了最高的重要性,但也显示了其他功能的贡献,如年龄和药物在特定环境中的贡献。总之,加速度计数据与患者人口统计学和先前的疼痛测量值相结合,可用于从ICU中的无痛发作中筛查疼痛,并可与镇痛信息相结合,以在不同严重程度的疼痛发作之间提供中等程度的分类.
    Quantifying pain in patients admitted to intensive care units (ICUs) is challenging due to the increased prevalence of communication barriers in this patient population. Previous research has posited a positive correlation between pain and physical activity in critically ill patients. In this study, we advance this hypothesis by building machine learning classifiers to examine the ability of accelerometer data collected from daily wearables to predict self-reported pain levels experienced by patients in the ICU. We trained multiple Machine Learning (ML) models, including Logistic Regression, CatBoost, and XG-Boost, on statistical features extracted from the accelerometer data combined with previous pain measurements and patient demographics. Following previous studies that showed a change in pain sensitivity in ICU patients at night, we performed the task of pain classification separately for daytime and nighttime pain reports. In the pain versus no-pain classification setting, logistic regression gave the best classifier in daytime (AUC: 0.72, F1-score: 0.72), and CatBoost gave the best classifier at nighttime (AUC: 0.82, F1-score: 0.82). Performance of logistic regression dropped to 0.61 AUC, 0.62 F1-score (mild vs. moderate pain, nighttime), and CatBoost\'s performance was similarly affected with 0.61 AUC, 0.60 F1-score (moderate vs. severe pain, daytime). The inclusion of analgesic information benefited the classification between moderate and severe pain. SHAP analysis was conducted to find the most significant features in each setting. It assigned the highest importance to accelerometer-related features on all evaluated settings but also showed the contribution of the other features such as age and medications in specific contexts. In conclusion, accelerometer data combined with patient demographics and previous pain measurements can be used to screen painful from painless episodes in the ICU and can be combined with analgesic information to provide moderate classification between painful episodes of different severities.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    超导是凝聚态物理中的一个显著现象,其中包括一系列令人着迷的特性,这些特性有望彻底改变与能源相关的技术和相关的基础研究。然而,该领域面临着在室温下实现超导性的挑战。近年来,人工智能(AI)方法已成为一种有前途的工具,用于预测诸如转变温度(Tc)之类的特性,从而能够快速筛选大型数据库以发现新的超导材料。本研究采用SuperCon数据集作为最大的超导材料数据集。然后,我们执行各种数据预处理步骤以导出干净的DataG数据集,含有13,022种化合物。在研究的另一个阶段,我们应用新的CatBoost算法来预测新型超导材料的转变温度。此外,我们开发了一个叫做Jabir的包,生成322个原子描述符。我们还设计了一种创新的混合方法,称为Soraya包,从功能空间中选择最关键的功能。这些产率R2和RMSE值(0.952和6.45K,分别)优于文献中先前报道的那些。最后,作为对该领域的新颖贡献,设计了一个Web应用程序来预测和确定超导材料的Tc值。
    Superconductivity is a remarkable phenomenon in condensed matter physics, which comprises a fascinating array of properties expected to revolutionize energy-related technologies and pertinent fundamental research. However, the field faces the challenge of achieving superconductivity at room temperature. In recent years, Artificial Intelligence (AI) approaches have emerged as a promising tool for predicting such properties as transition temperature (Tc) to enable the rapid screening of large databases to discover new superconducting materials. This study employs the SuperCon dataset as the largest superconducting materials dataset. Then, we perform various data pre-processing steps to derive the clean DataG dataset, containing 13,022 compounds. In another stage of the study, we apply the novel CatBoost algorithm to predict the transition temperatures of novel superconducting materials. In addition, we developed a package called Jabir, which generates 322 atomic descriptors. We also designed an innovative hybrid method called the Soraya package to select the most critical features from the feature space. These yield R2 and RMSE values (0.952 and 6.45 K, respectively) superior to those previously reported in the literature. Finally, as a novel contribution to the field, a web application was designed for predicting and determining the Tc values of superconducting materials.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    人机界面技术从根本上受到运动解码的灵巧性的制约。同步控制和比例控制可以大大提高智能假体的灵活性和灵巧性。在这项研究中,提出了一种利用集成学习解决角度解码问题的新模型。最终,设计了7种表面肌电图(sEMG)信号角度解码模型。使用功能任务期间记录的sEMG估计掌指关节(MCP)的五个角度的运动学。通过皮尔逊相关系数(CC)评估估计性能。在这项研究中,综合模型,结合了CatBoost和LightGBM,是这项任务的最佳模式,其平均CC值和RMSE分别为0.897和7.09。受试者数据集的所有测试场景的CC平均值和RMSE平均值优于高斯过程模型的结果,具有显著差异。此外,该研究提出了一个完整的管道,使用集成学习来构建一个高性能的角度解码系统,用于手部动作识别任务。该领域的研究人员或工程师可以通过此过程快速找到最适合角度解码的集成学习模型,与传统的深度学习模型相比,具有更少的参数和更少的训练数据需求。总之,提出的集成学习方法具有同时和比例控制(SPC)未来的手假肢的潜力。
    Human-machine interface technology is fundamentally constrained by the dexterity of motion decoding. Simultaneous and proportional control can greatly improve the flexibility and dexterity of smart prostheses. In this research, a new model using ensemble learning to solve the angle decoding problem is proposed. Ultimately, seven models for angle decoding from surface electromyography (sEMG) signals are designed. The kinematics of five angles of the metacarpophalangeal (MCP) joints are estimated using the sEMG recorded during functional tasks. The estimation performance was evaluated through the Pearson correlation coefficient (CC). In this research, the comprehensive model, which combines CatBoost and LightGBM, is the best model for this task, whose average CC value and RMSE are 0.897 and 7.09. The mean of the CC and the mean of the RMSE for all the test scenarios of the subjects\' dataset outperform the results of the Gaussian process model, with significant differences. Moreover, the research proposed a whole pipeline that uses ensemble learning to build a high-performance angle decoding system for the hand motion recognition task. Researchers or engineers in this field can quickly find the most suitable ensemble learning model for angle decoding through this process, with fewer parameters and fewer training data requirements than traditional deep learning models. In conclusion, the proposed ensemble learning approach has the potential for simultaneous and proportional control (SPC) of future hand prostheses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号