random forest classifier

随机森林分类器
  • 文章类型: Journal Article
    背景:由于多重耐药生物体(MDROs)引起的医疗保健相关感染,如耐甲氧西林金黄色葡萄球菌(MRSA)和艰难梭菌(CDI),给我们的医疗基础设施带来沉重负担。
    目的:MDROs的筛查是防止传播的重要机制,但却是资源密集型的。这项研究的目的是开发可以使用电子健康记录(EHR)数据预测定植或感染风险的自动化工具,提供有用的信息来帮助感染控制,并指导经验性抗生素覆盖。
    方法:我们回顾性地开发了一个机器学习模型来检测在弗吉尼亚大学医院住院患者样本采集时未分化患者的MRSA定植和感染。我们使用来自患者EHR数据的入院和住院期间信息的临床和非临床特征来构建模型。此外,我们在EHR数据中使用了一类从联系网络派生的特征;这些网络特征可以捕获患者与提供者和其他患者的联系,提高预测MRSA监测试验结果的模型可解释性和准确性。最后,我们探索了不同患者亚群的异质模型,例如,入住重症监护病房或急诊科的人或有特定检测史的人,哪个表现更好。
    结果:我们发现惩罚逻辑回归比其他方法表现更好,当我们使用多项式(二次)变换特征时,该模型的性能根据其接收器操作特征-曲线下面积得分提高了近11%。预测MDRO风险的一些重要特征包括抗生素使用,手术,使用设备,透析,患者的合并症状况,和网络特征。其中,网络功能增加了最大的价值,并将模型的性能提高了至少15%。对于特定患者亚群,具有相同特征转换的惩罚逻辑回归模型也比其他模型表现更好。
    结论:我们的研究表明,使用来自EHR数据的临床和非临床特征,通过机器学习方法可以非常有效地进行MRSA风险预测。网络特征是最具预测性的,并且提供优于现有方法的显著改进。此外,不同患者亚群的异质预测模型提高了模型的性能。
    BACKGROUND: Health care-associated infections due to multidrug-resistant organisms (MDROs), such as methicillin-resistant Staphylococcus aureus (MRSA) and Clostridioides difficile (CDI), place a significant burden on our health care infrastructure.
    OBJECTIVE: Screening for MDROs is an important mechanism for preventing spread but is resource intensive. The objective of this study was to develop automated tools that can predict colonization or infection risk using electronic health record (EHR) data, provide useful information to aid infection control, and guide empiric antibiotic coverage.
    METHODS: We retrospectively developed a machine learning model to detect MRSA colonization and infection in undifferentiated patients at the time of sample collection from hospitalized patients at the University of Virginia Hospital. We used clinical and nonclinical features derived from on-admission and throughout-stay information from the patient\'s EHR data to build the model. In addition, we used a class of features derived from contact networks in EHR data; these network features can capture patients\' contacts with providers and other patients, improving model interpretability and accuracy for predicting the outcome of surveillance tests for MRSA. Finally, we explored heterogeneous models for different patient subpopulations, for example, those admitted to an intensive care unit or emergency department or those with specific testing histories, which perform better.
    RESULTS: We found that the penalized logistic regression performs better than other methods, and this model\'s performance measured in terms of its receiver operating characteristics-area under the curve score improves by nearly 11% when we use polynomial (second-degree) transformation of the features. Some significant features in predicting MDRO risk include antibiotic use, surgery, use of devices, dialysis, patient\'s comorbidity conditions, and network features. Among these, network features add the most value and improve the model\'s performance by at least 15%. The penalized logistic regression model with the same transformation of features also performs better than other models for specific patient subpopulations.
    CONCLUSIONS: Our study shows that MRSA risk prediction can be conducted quite effectively by machine learning methods using clinical and nonclinical features derived from EHR data. Network features are the most predictive and provide significant improvement over prior methods. Furthermore, heterogeneous prediction models for different patient subpopulations enhance the model\'s performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    森林冠层覆盖(FCC)在森林评估和管理中至关重要,影响生态系统服务,如碳封存,野生动物栖息地,和水的调节。准确有效地映射和提取FCC信息的技术的不断进步需要对其有效性和可靠性进行全面评估。本研究的主要目标是:(1)创建具有1米空间分辨率的大规模森林FCC数据集,(2)在区域尺度上评估FCC的区域空间分布,和(3)调查全球森林变化中FCC区域的差异(Hansen等人。,2013)和阿肯色州各种空间尺度的美国森林服务树冠覆盖产品(即,县级和市级)。这项研究利用了高分辨率的航空图像和机器学习算法,使用GoogleEarthEngine云计算平台进行了处理和分析,以生成FCC数据集。使用从全球森林变化中获得的参考位置的三分之一验证了该数据集的准确性(Hansen等人。,2013)数据集和国家农业图像计划(NAIP)航空图像,空间分辨率为0.6米。结果表明,该数据集在研究区域中以1-m的分辨率成功识别了FCC,总体准确率在每个县83.31%至94.35%之间。产生的FCC数据集和Hansen等人之间的空间比较结果。,2013年和USFS产品显示出强正相关,县级和市级的R2值在0.94到0.98之间。该数据集为监测提供了有价值的信息,预测,和管理阿肯色州及其他地区的森林资源。本研究采用的方法提高效率,成本效益,和可扩展性,因为它可以在基于云的环境中处理具有高计算要求的大规模数据集。它还证明了机器学习和云计算技术可以生成高分辨率的森林覆盖数据集,这可能对世界其他地区有所帮助。
    Forest canopy cover (FCC) is essential in forest assessment and management, affecting ecosystem services such as carbon sequestration, wildlife habitat, and water regulation. Ongoing advancements in techniques for accurately and efficiently mapping and extracting FCC information require a thorough evaluation of their validity and reliability. The primary objectives of this study are to: (1) create a large-scale forest FCC dataset with a 1-meter spatial resolution, (2) assess the regional spatial distribution of FCC at a regional scale, and (3) investigate differences in FCC areas among the Global Forest Change (Hansen et al., 2013) and U.S. Forest Service Tree Canopy Cover products at various spatial scales in Arkansas (i.e., county and city levels). This study utilized high-resolution aerial imagery and a machine learning algorithm processed and analyzed using the Google Earth Engine cloud computing platform to produce the FCC dataset. The accuracy of this dataset was validated using one-third of the reference locations obtained from the Global Forest Change (Hansen et al., 2013) dataset and the National Agriculture Imagery Program (NAIP) aerial imagery with a 0.6-m spatial resolution. The results showed that the dataset successfully identified FCC at a 1-m resolution in the study area, with overall accuracy ranging between 83.31% and 94.35% per county. Spatial comparison results between the produced FCC dataset and the Hansen et al., 2013 and USFS products indicated a strong positive correlation, with R2 values ranging between 0.94 and 0.98 for county and city levels. This dataset provides valuable information for monitoring, forecasting, and managing forest resources in Arkansas and beyond. The methodology followed in this study enhances efficiency, cost-effectiveness, and scalability, as it enables the processing of large-scale datasets with high computational demands in a cloud-based environment. It also demonstrates that machine learning and cloud computing technologies can generate high-resolution forest cover datasets, which might be helpful in other regions of the world.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    接受异基因造血干细胞移植(HSCT)的儿童容易发生急性肾损伤(AKI)。肾损伤标志物:肾损伤分子(KIM)-1,白细胞介素(IL)-18和中性粒细胞明胶酶相关脂质运载蛋白(NGAL)可能有助于AKI的早期诊断。这项研究的目的是评估接受HSCT的儿童中KIM-1,IL-18和NGAL的血清浓度与肾功能的经典标志物(肌酐,胱抑素C,估计的肾小球滤过率(eGFR)),并使用人工智能工具分析其作为肾脏损害预测因子的有用性。血清KIM-1,IL-18,NGAL,对27例接受HSCT的儿童在移植前和手术后4周进行ELISA和胱抑素C评估。数据用于构建肾损伤预测的随机森林分类器(RFC)模型。基于3个输入变量建立的RFC模型,HSCT前儿童血清中KIM-1、IL-18和NGAL的浓度,能够有效地评估患者的过度滤过率,手术后4周肾脏损伤的替代标记。随着RFC模型的使用,血清KIM-1、IL-18和NGAL可作为儿童HSCT术后早期肾功能不全的标志物。
    Children undergoing allogeneic hematopoietic stem cell transplantation (HSCT) are prone to developing acute kidney injury (AKI). Markers of kidney damage: kidney injury molecule (KIM)-1, interleukin (IL)-18, and neutrophil gelatinase-associated lipocalin (NGAL) may ease early diagnosis of AKI. The aim of this study was to assess serum concentrations of KIM-1, IL-18, and NGAL in children undergoing HSCT in relation to classical markers of kidney function (creatinine, cystatin C, estimated glomerular filtration rate (eGFR)) and to analyze their usefulness as predictors of kidney damage with the use of artificial intelligence tools. Serum concentrations of KIM-1, IL-18, NGAL, and cystatin C were assessed by ELISA in 27 children undergoing HSCT before transplantation and up to 4 weeks after the procedure. The data was used to build a Random Forest Classifier (RFC) model of renal injury prediction. The RFC model established on the basis of 3 input variables, KIM-1, IL-18, and NGAL concentrations in the serum of children before HSCT, was able to effectively assess the rate of patients with hyperfiltration, a surrogate marker of kidney injury 4 weeks after the procedure. With the use of the RFC model, serum KIM-1, IL-18, and NGAL may serve as markers of incipient renal dysfunction in children after HSCT.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    超低剂量激素丸已成为多囊卵巢综合征(PCOS)的首选治疗选择。现有文献显示,具有不同组成的药丸的代谢作用具有明显的异质性。这可能是由于比较结果变量的选择不当。总胆固醇-高密度脂蛋白比率(TC/HDL)与低密度脂蛋白胆固醇(LDL-C)的不一致是动脉粥样硬化性心血管疾病未来风险的更好标志,并且对变化稳定。本研究是一项随机对照试验,旨在比较使用不同成分的激素药丸治疗的非肥胖PCOS妇女中TC/HDL和LDL不一致的患病率。确定了基线参与者特征和药丸成分在治疗后不一致中的作用。妇女被随机分为三臂,两组接受超低剂量丸剂(乙炔雌二醇20µg与drosperinone3mg或乙炔雌二醇15µg与gestodene60µg),一组接受低剂量丸剂(乙炔雌二醇30µg与去氧孕酮150µg)。在干预前,超过四分之一的参与者观察到不一致。治疗1年后,不到五分之一的参与者不和谐。超低剂量药丸使用者的不一致性较低,LDL,治疗1年后,TC高于低剂量药丸使用者。随机森林,非线性分类器,在预测不一致方面表现出最高的准确性。对不一致发生影响最大的基线参数是甘油三酯,HOMA-IR,BMI,和HDL。
    OBJECTIVE: Regular users of hormonal contraceptive pills show marked heterogeneity in metabolic effects with variations in compositions. This might be due to choice of outcome variables for comparison. Total cholesterol-high-density lipoprotein ratio (TC/HDL) discordance with low-density lipoprotein (LDL-C) has now become an established marker of future risk for atherosclerotic cardiovascular disease and stable to variations in user.
    METHODS: The present study was a randomized controlled trial to compare prevalence of TC/HDL and LDL discordance among non-obese women with polycystic ovarian syndrome (PCOS) treated with hormonal pills. Women were randomized into three arms, two arms received ultra-low dose pills (Ethinylestradiol [EE] 20 μg with drosperinone 3 mg or EE 15 μg with gestodene 60 μg) and one arm received low dose pill (EE 30 μg with desogestrel 150 μg). The role of baseline participant features and pill composition on discordance was determined.
    RESULTS: Discordance was observed in more than a quarter of the participants before intervention. After 1 year of treatment, less than a fifth of the participants were discordant. Ultralow-dose pill users had lower discordance, LDL, and TC than low-dose pill users after 1 year of treatment. The random forest, a non-linear classifier, showed the highest accuracy in predicting discordance. The baseline Parameters with the maximal impact on the occurrence of discordance were triglyceride, homeostatic model assessment for insulin resistance, body mass index, and high density lipoprotein.
    CONCLUSIONS: Non-obese PCOS women on ultra-low dose pill have a lower risk of acquiring future atherosclerotic cardiovascular disease.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Production plantation forestry has many economic benefits but can also have negative environmental impacts such as the spreading of invasive pines to native forest habitats. Monitoring forest for the presence of invasive pines helps with the management of this issue. However, detection of vegetation change over a large time period is difficult due to changes in image quality and sensor types, and by the spectral similarity of evergreen species and frequent cloud cover in the study area. The costs of high-resolution images are also prohibitive for routine monitoring in resource-constrained countries. This research investigated the use of remote sensing to identify the spread of Pinus caribaea over a 21-year period (2000 to 2021) in Belihuloya, Sri Lanka, using Landsat images. It applied a range of techniques to produce cloud free images, extract vegetation features, and improve vegetation classification accuracy, followed by the use of Geographical Information System to spatially analyze the spread of invasive pines. The results showed most invading pines were found within 100 m of the pine plantations\' borders where broadleaved forests and grasslands are vulnerable to invasion. However, the extent of invasive pine had an overall decline of 4 ha over the 21 years. The study confirmed that remote sensing combined with spatial analysis are effective tools for monitoring invasive pines in countries with limited resources. This study also provides information to conservationists and forest managers to conduct strategic planning for sustainable forest management and conservation in Sri Lanka.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    尽管由具有广泛正反馈的高度可塑性神经元组成,神经系统维持稳定的整体功能。为了将活动保持在界限内,它依赖于一组负反馈机制,这些机制可以引起稳定调整,统称为“稳态可塑性”。\"最近,一个高度兴奋的微域,发现位于轴突近端-轴突初始段(AIS)-响应于活动扰动而表现出结构修饰。虽然AIS可塑性似乎有稳态的目的,控制其表达及其在调节神经元兴奋性中的功能作用的许多方面仍然难以捉摸。研究该现象的主要挑战是其表达的丰富异质性(远端/近端重新定位,缩短,延长)及其功能作用的可变性。一个潜在的解决方案是随着时间的推移跟踪大量神经元的AIS,并试图在其中诱导结构可塑性。为此,一种有前途的方法是使用细胞外电生理读数通过高密度微电极阵列(HD-MEAs)以高时空分辨率跟踪大量神经元。然而,缺少一个能够可靠地识别唯一映射到底层微观结构变化的特定活动特征的分析框架。在这项研究中,我们评估了这项任务的可行性,并将AIS的远端重新定位作为一个示例性问题.我们使用复杂的计算模型来系统地探索AIS位置的增量变化与模拟细胞外场电位中观察到的特定后果之间的关系。确定了可靠表征AIS可塑性的细胞外场特征变化的集合。我们训练了可以以惊人的准确性检测这些特征的模型。基于这些发现,我们提出了一种混合分析框架,该框架可能使使用HD-MEA对活性依赖性AIS可塑性进行高通量实验研究成为可能。
    Despite being composed of highly plastic neurons with extensive positive feedback, the nervous system maintains stable overall function. To keep activity within bounds, it relies on a set of negative feedback mechanisms that can induce stabilizing adjustments and that are collectively termed \"homeostatic plasticity.\" Recently, a highly excitable microdomain, located at the proximal end of the axon-the axon initial segment (AIS)-was found to exhibit structural modifications in response to activity perturbations. Though AIS plasticity appears to serve a homeostatic purpose, many aspects governing its expression and its functional role in regulating neuronal excitability remain elusive. A central challenge in studying the phenomenon is the rich heterogeneity of its expression (distal/proximal relocation, shortening, lengthening) and the variability of its functional role. A potential solution is to track AISs of a large number of neurons over time and attempt to induce structural plasticity in them. To this end, a promising approach is to use extracellular electrophysiological readouts to track a large number of neurons at high spatiotemporal resolution by means of high-density microelectrode arrays (HD-MEAs). However, an analysis framework that reliably identifies specific activity signatures that uniquely map on to underlying microstructural changes is missing. In this study, we assessed the feasibility of such a task and used the distal relocation of the AIS as an exemplary problem. We used sophisticated computational models to systematically explore the relationship between incremental changes in AIS positions and the specific consequences observed in simulated extracellular field potentials. An ensemble of feature changes in the extracellular fields that reliably characterize AIS plasticity was identified. We trained models that could detect these signatures with remarkable accuracy. Based on these findings, we propose a hybrid analysis framework that could potentially enable high-throughput experimental studies of activity-dependent AIS plasticity using HD-MEAs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    虚弱和跌倒是衰老的两个不利特征,它们损害了老年人的生活质量并增加了医疗保健系统的负担。存在各种方法来评估脆弱,但它们都不被认为是黄金标准。还提出了技术方法来评估老年人跌倒的风险。这项研究旨在提出一种客观的方法,以补充用于识别老年人的虚弱状态和跌倒风险的现有方法。
    共712名受试者(年龄:71.3±8.2岁,包括505名女性和207名男性)从两个日本城市招募。根据Kihon清单,有两百零三人被归类为体弱者。在过去的12个月中,有104个人有跌倒的历史。受试者进行了45s站立平衡测试和20m圆形步行试验。使用7传感器鞋垫收集足底压力数据。提取了一百八十四个数据特征。使用自动学习随机森林算法来构建脆弱分类器和下降分类器。探索了分类模型中特征的辨别能力。
    体弱受试者识别的总体平衡精度为0.75±0.04(F1得分:0.77±0.03)。使用为65岁以上男性收集的数据进行的一项子分析仅显示准确性高达0.78±0.07(F1评分:0.79±0.05)。对最近有跌倒史的受试者进行分类的总体平衡准确性为0.57±0.05(F1得分:0.62±0.04)。受试者相对于其虚弱状态的分类主要依赖于从步行测试期间收集的足底压力系列中提取的特征。
    在未来,用插入老年人鞋子中的智能鞋垫测量的足底压力可用于评估与物理尺寸相关的虚弱方面(例如,步态和平衡改变),从而帮助临床医生早期识别虚弱的个体。
    Frailty and falls are two adverse characteristics of aging that impair the quality of life of senior people and increase the burden on the healthcare system. Various methods exist to evaluate frailty, but none of them are considered the gold standard. Technological methods have also been proposed to assess the risk of falling in seniors. This study aims to propose an objective method for complementing existing methods used to identify the frail state and risk of falling in older adults.
    A total of 712 subjects (age: 71.3 ± 8.2 years, including 505 women and 207 men) were recruited from two Japanese cities. Two hundred and three people were classified as frail according to the Kihon Checklist. One hundred and forty-two people presented with a history of falling during the previous 12 months. The subjects performed a 45 s standing balance test and a 20 m round walking trial. The plantar pressure data were collected using a 7-sensor insole. One hundred and eighty-four data features were extracted. Automatic learning random forest algorithms were used to build the frailty and faller classifiers. The discrimination capabilities of the features in the classification models were explored.
    The overall balanced accuracy for the recognition of frail subjects was 0.75 ± 0.04 (F1-score: 0.77 ± 0.03). One sub-analysis using data collected for men aged > 65 years only revealed accuracies as high as 0.78 ± 0.07 (F1-score: 0.79 ± 0.05). The overall balanced accuracy for classifying subjects with a recent history of falling was 0.57 ± 0.05 (F1-score: 0.62 ± 0.04). The classification of subjects relative to their frailty state primarily relied on features extracted from the plantar pressure series collected during the walking test.
    In the future, plantar pressures measured with smart insoles inserted in the shoes of senior people may be used to evaluate aspects of frailty related to the physical dimension (e.g., gait and balance alterations), thus allowing assisting clinicians in the early identification of frail individuals.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    未经证实:心房颤动(AF)是最常见的心血管疾病之一,其无症状趋势使房颤检测具有挑战性。机器和深度学习方法通常用于AF检测。
    UNASSIGNED:这项研究的目的是评估卷积神经网络(CNN)和随机森林(RF)机器学习模型提供的信息,以进行AF分类。
    UNASSIGNED:我们手动提取了166个时频域以及线性和非线性特征,将单导联心电图(ECG)分类为正常,AF,other,或嘈杂的窦性心律。我们使用射频模型中使用的遗传算法选择了56个鲁棒特征的子集。在另一项研究中,一维,在原始ECG节律上设计了12层CNN。来自CNN的输出层的四个特征和来自完全连接层的128个特征被独立地探索用于分类。这些模型在8,528个ECG上进行了训练和内部验证,并在包含3,658个ECG的隐藏数据集上进行了外部验证。接下来,我们分析了工程和CNN学习特征之间的相关性.
    UNASSIGNED:使用56个工程特征训练的RF分类器对于正常,F1得分为0.91、0.78和0.72,AF,和其他节奏,分别。然而,支持向量机和CNN模型的集合分别导致F1得分为0.92、0.87和0.80。
    UNASSIGNED:我们探索了各种功能和机器学习模型,以使用短(9-61秒)单导联ECG记录来识别AF节律。我们的结果表明,提出的CNN模型为AF分类提取了独特的特征。
    UNASSIGNED: Atrial fibrillation (AF) is one of the most common cardiovascular problems, and its asymptomatic tendency makes AF detection challenging. Machine and deep learning methods are commonly used in AF detection.
    UNASSIGNED: The purpose of this study was to evaluate the information provided by convolutional neural network (CNN) and random forest (RF) machine learning models for AF classification.
    UNASSIGNED: We manually extracted 166 time-frequency domains and linear and nonlinear features to classify single-lead electrocardiograms (ECGs) as normal, AF, other, or noisy sinus rhythms. We selected a subset of 56 robust features using a genetic algorithm that was used in the RF model. In a separate study, a 1-dimensional, 12-layer CNN was designed on the raw ECG rhythms. Four features from the output layer and 128 features from the fully connected layer of CNN were explored independently for classification. The models were trained and internally validated on 8,528 ECGs and externally validated on a hidden dataset containing 3,658 ECGs. Next,we analyzed the correlation between engineered and CNN-learned features.
    UNASSIGNED: An RF classifier trained with 56-engineered features resulted in an F1 score of 0.91, 0.78, and 0.72 for normal, AF, and other rhythms, respectively. However, an ensemble of support vector machine and the CNN model resulted in an F1 score of 0.92, 0.87, and 0.80, respectively.
    UNASSIGNED: We explored various features and machine learning models to identify AF rhythms using short (9-61 seconds) single-lead ECG recordings. Our results showed that the proposed CNN model abstracted distinctive features for AF classification.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    限制使用通过肌电图和模式识别控制的灵巧机器人手假体的一个主要挑战涉及从头开始训练复杂模型所需的重要努力。为了克服这个问题,近年来的一些研究提出使用迁移学习,将预训练模型(从先前受试者获得)与对特定用户执行的训练会话相结合。尽管过去报道了一些有希望的结果,最近表明,如果在不利用迁移学习的标准方法上执行适当的超参数优化,则使用常规迁移学习算法不会提高性能。本文的目的是通过使用没有超参数优化的随机森林分类器来介绍有关该主题的新颖分析,并通过对同一患者记录的数据进行的实验来扩展它们。但是在不同的数据采集会话中。在随机森林分类器上测试了两种域自适应技术,允许我们对健康受试者和截肢者进行实验。与以前的几篇论文不同,我们的结果表明,在准确性方面没有明显改善,无论测试的迁移学习技术如何。当使用来自同一受试者但在五个不同天记录的十次数据采集作为来源时,在受试者内实验环境中也首次证明了缺乏自适应学习。
    One major challenge limiting the use of dexterous robotic hand prostheses controlled via electromyography and pattern recognition relates to the important efforts required to train complex models from scratch. To overcome this problem, several studies in recent years proposed to use transfer learning, combining pre-trained models (obtained from prior subjects) with training sessions performed on a specific user. Although a few promising results were reported in the past, it was recently shown that the use of conventional transfer learning algorithms does not increase performance if proper hyperparameter optimization is performed on the standard approach that does not exploit transfer learning. The objective of this paper is to introduce novel analyses on this topic by using a random forest classifier without hyperparameter optimization and to extend them with experiments performed on data recorded from the same patient, but in different data acquisition sessions. Two domain adaptation techniques were tested on the random forest classifier, allowing us to conduct experiments on healthy subjects and amputees. Differently from several previous papers, our results show that there are no appreciable improvements in terms of accuracy, regardless of the transfer learning techniques tested. The lack of adaptive learning is also demonstrated for the first time in an intra-subject experimental setting when using as a source ten data acquisitions recorded from the same subject but on five different days.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    糖尿病(DM)已成为继肿瘤之后第三个影响患者的慢性非传染性疾病,心脑血管疾病,并已成为世界上主要的公共卫生问题之一。因此,为了建立DM的预防策略,确定DM高危人群非常重要。
    针对医学数据的高维特征空间和高特征冗余问题,以及经常面临的数据不平衡问题。本研究探索了不同的监督分类器,结合SVM-SMOTE和两种特征降维方法(Logistic逐步回归和LAASO)对类别不平衡、相关因素复杂的糖尿病调查样本数据进行分类。分析和讨论了基于4种数据处理方法的4种监督分类器的分类结果。五个指标包括准确性,Precision,回想一下,选择F1-Score和AUC作为评价分类模型性能的关键指标。
    根据结果,结合SVM-SMOTE重采样技术和LASSO特征筛选方法(精度=0.890,精度=0.869,召回=0.919,F1-Score=0.893,AUC=0.948)的随机森林分类器被证明是判断糖尿病高危人群的最佳方法。此外,该组合算法有助于提高DM高危人群预测的分类性能。此外,年龄,区域,心率,高血压,高脂血症和BMI是影响糖尿病的六个最关键的特征变量。
    随机森林分类器结合SVM-SMOTE和LASSO特征减少方法在从个体中识别DM高危人群方面表现最佳。研究中提出的组合方法将是早期筛查DM的良好工具。
    Diabetes Mellitus (DM) has become the third chronic non-communicable disease that hits patients after tumors, cardiovascular and cerebrovascular diseases, and has become one of the major public health problems in the world. Therefore, it is of great importance to identify individuals at high risk for DM in order to establish prevention strategies for DM.
    Aiming at the problem of high-dimensional feature space and high feature redundancy of medical data, as well as the problem of data imbalance often faced. This study explored different supervised classifiers, combined with SVM-SMOTE and two feature dimensionality reduction methods (Logistic stepwise regression and LAASO) to classify the diabetes survey sample data with unbalanced categories and complex related factors. Analysis and discussion of the classification results of 4 supervised classifiers based on 4 data processing methods. Five indicators including Accuracy, Precision, Recall, F1-Score and AUC are selected as the key indicators to evaluate the performance of the classification model.
    According to the result, Random Forest Classifier combining SVM-SMOTE resampling technology and LASSO feature screening method (Accuracy = 0.890, Precision = 0.869, Recall = 0.919, F1-Score = 0.893, AUC = 0.948) proved the best way to tell those at high risk of DM. Besides, the combined algorithm helps enhance the classification performance for prediction of high-risk people of DM. Also, age, region, heart rate, hypertension, hyperlipidemia and BMI are the top six most critical characteristic variables affecting diabetes.
    The Random Forest Classifier combining with SVM-SMOTE and LASSO feature reduction method perform best in identifying high-risk people of DM from individuals. And the combined method proposed in the study would be a good tool for early screening of DM.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号