synthetic minority oversampling technique

合成少数过采样技术
  • 文章类型: Journal Article
    背景:术后谵妄,尤其是在老年患者腹部癌症手术后,在临床管理中提出了重大挑战。
    目的:建立一种基于合成少数过采样技术(SMOTE)的老年腹部肿瘤患者术后谵妄预测模型。
    方法:在这项回顾性队列研究中,我们分析了2020年9月至2022年10月在我院接受腹部恶性肿瘤手术的611例老年患者的数据.术后7d记录术后谵妄发生率。根据术后是否发生谵妄分为谵妄组和非谵妄组。使用多变量逻辑回归模型来识别危险因素并建立术后谵妄的预测模型。应用SMOTE技术通过过度采样谵妄病例来增强模型。然后验证了模型的预测准确性。
    结果:在我们的研究中,包括611例老年腹部恶性肿瘤患者,多因素logistic回归分析确定了术后谵妄的重要危险因素。其中包括Charlson合并症指数,美国麻醉医师学会分类,脑血管病史,手术时间,围手术期输血,术后疼痛评分。术后谵妄发生率为22.91%。原始预测模型(P1)显示出受试者工作特性曲线下的面积为0.862。相比之下,基于SMOTE的逻辑预警模型(P2),它利用了SMOTE过采样算法,显示0.856的曲线下面积略低但相当,表明两种预测方法之间的性能没有显着差异。
    结论:这项研究证实,SMOTE增强的老年腹部肿瘤患者术后谵妄预测模型表现出与传统方法相当的性能,有效解决数据不平衡问题。
    BACKGROUND: Postoperative delirium, particularly prevalent in elderly patients after abdominal cancer surgery, presents significant challenges in clinical management.
    OBJECTIVE: To develop a synthetic minority oversampling technique (SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients.
    METHODS: In this retrospective cohort study, we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022. The incidence of postoperative delirium was recorded for 7 d post-surgery. Patients were divided into delirium and non-delirium groups based on the occurrence of postoperative delirium or not. A multivariate logistic regression model was used to identify risk factors and develop a predictive model for postoperative delirium. The SMOTE technique was applied to enhance the model by oversampling the delirium cases. The model\'s predictive accuracy was then validated.
    RESULTS: In our study involving 611 elderly patients with abdominal malignant tumors, multivariate logistic regression analysis identified significant risk factors for postoperative delirium. These included the Charlson comorbidity index, American Society of Anesthesiologists classification, history of cerebrovascular disease, surgical duration, perioperative blood transfusion, and postoperative pain score. The incidence rate of postoperative delirium in our study was 22.91%. The original predictive model (P1) exhibited an area under the receiver operating characteristic curve of 0.862. In comparison, the SMOTE-based logistic early warning model (P2), which utilized the SMOTE oversampling algorithm, showed a slightly lower but comparable area under the curve of 0.856, suggesting no significant difference in performance between the two predictive approaches.
    CONCLUSIONS: This study confirms that the SMOTE-enhanced predictive model for postoperative delirium in elderly abdominal tumor patients shows performance equivalent to that of traditional methods, effectively addressing data imbalance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:脑机接口(BCI)中耗时的数据标记会引起许多问题,例如精神疲劳,并且是阻碍基于运动图像(MI)的BCI在现实世界中采用的关键因素之一。另一种方法是整合现成的,以及信息,未标记的在线数据,而这种方法研究较少。
    方法:我们提出了一种在线半监督学习方案,以提高基于MI的BCI的分类性能。该方案使用正则化加权在线顺序极限学习机(RWOS-ELM)作为基础分类器,并使用传入的平衡数据逐块更新其模型参数。在初始阶段,我们设计了一种技术,将合成的少数过采样与编辑的最近邻规则相结合,用于数据增强,以构造更多的判别初始分类器。在线使用时,传入的数据块首先由RWOS-ELM以及辅助分类器伪标记,然后通过上述技术再次平衡。基于这些类平衡数据进一步更新初始分类器。
    结果:在两个公开可用的MI数据集上的离线实验结果证明了所提出的方案相对于其对应物的优越性。对六个受试者的进一步在线实验表明,通过从传入的未标记数据中学习,他们的BCI表现逐渐提高。
    结论:我们提出的在线半监督学习方案具有更高的计算和内存使用效率,这对于基于MI的在线BCI来说是有希望的,特别是在标记的训练数据不足的情况下。
    Time-consuming data labeling in brain-computer interfaces (BCIs) raises many problems such as mental fatigue and is one key factor that hinders the real-world adoption of motor imagery (MI)-based BCIs. An alternative approach is to integrate readily available, as well as informative, unlabeled data online, whereas this approach is less investigated.
    We proposed an online semi-supervised learning scheme to improve the classification performance of MI-based BCI. This scheme uses regularized weighted online sequential extreme learning machine (RWOS-ELM) as the base classifier and updates its model parameters with incoming balanced data chunk-by-chunk. In the initial stage, we designed a technique that combines the synthetic minority oversampling with the edited nearest neighbor rule for data augmentation to construct more discriminative initial classifiers. When used online, the incoming chunk of data is first pseudo-labeled by RWOS-ELM as well as an auxiliary classifier, and then balanced again by the above-mentioned technique. Initial classifiers are further updated based on these class-balanced data.
    Offline experimental results on two publicly available MI datasets demonstrate the superiority of the proposed scheme over its counterparts. Further online experiments on six subjects show that their BCI performance gradually improved by learning from incoming unlabeled data.
    Our proposed online semi-supervised learning scheme has higher computation and memory usage efficiency, which is promising for online MI-based BCIs, especially in the case of insufficient labeled training data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    基于深度学习的故障诊断通常需要丰富的数据供应,但是在实践中,故障样本很少,对现有的诊断方法在实际应用中实现高度准确的故障检测提出了相当大的挑战。本文提出了一种将时频特征过采样(TFFO)与卷积神经网络(CNN)相结合的旋转机械不平衡故障诊断方法。首先,滑动分段抽样方法主要用于增加一维信号形式的故障样本数量。紧接着,通过连续小波变换(CWT)将信号转换为二维时频特征图。随后,使用合成少数过采样技术(SMOTE)再次扩展少数样本以实现TFFO。在这样的两倍数据扩展之后,获取平衡数据集,并导入基于LeNet-5的改进2dCNN实现故障诊断。为了验证所提出的方法,在机车轮对轴承和齿轮箱上进行了两个涉及单故障和复合故障的实验,导致几个具有不同不平衡程度和不同信噪比的数据集。结果表明,该方法在分类精度和稳定性以及噪声鲁棒性方面的优势,在不平衡故障诊断中。故障分类准确率达到97%以上。
    Deep learning-based fault diagnosis usually requires a rich supply of data, but fault samples are scarce in practice, posing a considerable challenge for existing diagnosis approaches to achieve highly accurate fault detection in real applications. This paper proposes an imbalanced fault diagnosis of rotatory machinery that combines time-frequency feature oversampling (TFFO) with a convolutional neural network (CNN). First, the sliding segmentation sampling method is employed to primarily increase the number of fault samples in the form of one-dimensional signals. Immediately after, the signals are converted into two-dimensional time-frequency feature maps by continuous wavelet transform (CWT). Subsequently, the minority samples are expanded again using the synthetic minority oversampling technique (SMOTE) to realize TFFO. After such two-fold data expansion, a balanced data set is obtained and imported to an improved 2dCNN based on the LeNet-5 to implement fault diagnosis. In order to verify the proposed method, two experiments involving single and compound faults are conducted on locomotive wheel-set bearings and a gearbox, resulting in several datasets with different imbalanced degrees and various signal-to-noise ratios. The results demonstrate the advantages of the proposed method in terms of classification accuracy and stability as well as noise robustness in imbalanced fault diagnosis, and the fault classification accuracy is over 97%.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    BACKGROUND: In pulse signal analysis and identification, time domain and time frequency domain analysis methods can obtain interpretable structured data and build classification models using traditional machine learning methods. Unstructured data, such as pulse signals, contain rich information about the state of the cardiovascular system, and local features of unstructured data can be extracted and classified using deep learning.
    OBJECTIVE: The objective of this paper was to comprehensively use machine learning and deep learning classification methods to fully exploit the information about pulse signals.
    METHODS: Structured data were obtained by using time domain and time frequency domain analysis methods. A classification model was built using a support vector machine (SVM), a deep convolutional neural network (DCNN) kernel was used to extract local features of the unstructured data, and the stacking method was used to fuse the above classification results for decision making.
    RESULTS: The highest average accuracy of 0.7914 was obtained using only a single classifier, while the average accuracy obtained using the ensemble learning approach was 0.8330.
    CONCLUSIONS: Ensemble learning can effectively use information from structured and unstructured data to improve classification accuracy through decision-level fusion. This study provides a new idea and method for pulse signal classification, which is of practical value for pulse diagnosis objectification.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Predominantly occurring on cytosine, DNA methylation is a process by which cells can modify their DNAs to change the expression of gene products. It plays very important roles in life development but also in forming nearly all types of cancer. Therefore, knowledge of DNA methylation sites is significant for both basic research and drug development. Given an uncharacterized DNA sequence containing many cytosine residues, which one can be methylated and which one cannot? With the avalanche of DNA sequences generated during the postgenomic age, it is highly desired to develop computational methods for accurately identifying the methylation sites in DNA. Using the trinucleotide composition, pseudo amino acid components, and a dataset-optimizing technique, we have developed a new predictor called \"iDNA-Methyl\" that has achieved remarkably higher success rates in identifying the DNA methylation sites than the existing predictors. A user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/iDNA-Methyl, where users can easily get their desired results. We anticipate that the web-server predictor will become a very useful high-throughput tool for basic research and drug development and that the novel approach and technique can also be used to investigate many other DNA-related problems and genome analysis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号