synthetic minority oversampling technique

  • 文章类型: Journal Article
    BACKGROUND: Postoperative delirium, particularly prevalent in elderly patients after abdominal cancer surgery, presents significant challenges in clinical management.
    OBJECTIVE: To develop a synthetic minority oversampling technique (SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients.
    METHODS: In this retrospective cohort study, we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022. The incidence of postoperative delirium was recorded for 7 d post-surgery. Patients were divided into delirium and non-delirium groups based on the occurrence of postoperative delirium or not. A multivariate logistic regression model was used to identify risk factors and develop a predictive model for postoperative delirium. The SMOTE technique was applied to enhance the model by oversampling the delirium cases. The model\'s predictive accuracy was then validated.
    RESULTS: In our study involving 611 elderly patients with abdominal malignant tumors, multivariate logistic regression analysis identified significant risk factors for postoperative delirium. These included the Charlson comorbidity index, American Society of Anesthesiologists classification, history of cerebrovascular disease, surgical duration, perioperative blood transfusion, and postoperative pain score. The incidence rate of postoperative delirium in our study was 22.91%. The original predictive model (P1) exhibited an area under the receiver operating characteristic curve of 0.862. In comparison, the SMOTE-based logistic early warning model (P2), which utilized the SMOTE oversampling algorithm, showed a slightly lower but comparable area under the curve of 0.856, suggesting no significant difference in performance between the two predictive approaches.
    CONCLUSIONS: This study confirms that the SMOTE-enhanced predictive model for postoperative delirium in elderly abdominal tumor patients shows performance equivalent to that of traditional methods, effectively addressing data imbalance.






  • 文章类型: Journal Article
    Time-consuming data labeling in brain-computer interfaces (BCIs) raises many problems such as mental fatigue and is one key factor that hinders the real-world adoption of motor imagery (MI)-based BCIs. An alternative approach is to integrate readily available, as well as informative, unlabeled data online, whereas this approach is less investigated.
    We proposed an online semi-supervised learning scheme to improve the classification performance of MI-based BCI. This scheme uses regularized weighted online sequential extreme learning machine (RWOS-ELM) as the base classifier and updates its model parameters with incoming balanced data chunk-by-chunk. In the initial stage, we designed a technique that combines the synthetic minority oversampling with the edited nearest neighbor rule for data augmentation to construct more discriminative initial classifiers. When used online, the incoming chunk of data is first pseudo-labeled by RWOS-ELM as well as an auxiliary classifier, and then balanced again by the above-mentioned technique. Initial classifiers are further updated based on these class-balanced data.
    Offline experimental results on two publicly available MI datasets demonstrate the superiority of the proposed scheme over its counterparts. Further online experiments on six subjects show that their BCI performance gradually improved by learning from incoming unlabeled data.
    Our proposed online semi-supervised learning scheme has higher computation and memory usage efficiency, which is promising for online MI-based BCIs, especially in the case of insufficient labeled training data.






  • 文章类型: Journal Article
    Deep learning-based fault diagnosis usually requires a rich supply of data, but fault samples are scarce in practice, posing a considerable challenge for existing diagnosis approaches to achieve highly accurate fault detection in real applications. This paper proposes an imbalanced fault diagnosis of rotatory machinery that combines time-frequency feature oversampling (TFFO) with a convolutional neural network (CNN). First, the sliding segmentation sampling method is employed to primarily increase the number of fault samples in the form of one-dimensional signals. Immediately after, the signals are converted into two-dimensional time-frequency feature maps by continuous wavelet transform (CWT). Subsequently, the minority samples are expanded again using the synthetic minority oversampling technique (SMOTE) to realize TFFO. After such two-fold data expansion, a balanced data set is obtained and imported to an improved 2dCNN based on the LeNet-5 to implement fault diagnosis. In order to verify the proposed method, two experiments involving single and compound faults are conducted on locomotive wheel-set bearings and a gearbox, resulting in several datasets with different imbalanced degrees and various signal-to-noise ratios. The results demonstrate the advantages of the proposed method in terms of classification accuracy and stability as well as noise robustness in imbalanced fault diagnosis, and the fault classification accuracy is over 97%.






  • 文章类型: Journal Article
    BACKGROUND: In pulse signal analysis and identification, time domain and time frequency domain analysis methods can obtain interpretable structured data and build classification models using traditional machine learning methods. Unstructured data, such as pulse signals, contain rich information about the state of the cardiovascular system, and local features of unstructured data can be extracted and classified using deep learning.
    OBJECTIVE: The objective of this paper was to comprehensively use machine learning and deep learning classification methods to fully exploit the information about pulse signals.
    METHODS: Structured data were obtained by using time domain and time frequency domain analysis methods. A classification model was built using a support vector machine (SVM), a deep convolutional neural network (DCNN) kernel was used to extract local features of the unstructured data, and the stacking method was used to fuse the above classification results for decision making.
    RESULTS: The highest average accuracy of 0.7914 was obtained using only a single classifier, while the average accuracy obtained using the ensemble learning approach was 0.8330.
    CONCLUSIONS: Ensemble learning can effectively use information from structured and unstructured data to improve classification accuracy through decision-level fusion. This study provides a new idea and method for pulse signal classification, which is of practical value for pulse diagnosis objectification.






  • 文章类型: Journal Article
    Predominantly occurring on cytosine, DNA methylation is a process by which cells can modify their DNAs to change the expression of gene products. It plays very important roles in life development but also in forming nearly all types of cancer. Therefore, knowledge of DNA methylation sites is significant for both basic research and drug development. Given an uncharacterized DNA sequence containing many cytosine residues, which one can be methylated and which one cannot? With the avalanche of DNA sequences generated during the postgenomic age, it is highly desired to develop computational methods for accurately identifying the methylation sites in DNA. Using the trinucleotide composition, pseudo amino acid components, and a dataset-optimizing technique, we have developed a new predictor called \"iDNA-Methyl\" that has achieved remarkably higher success rates in identifying the DNA methylation sites than the existing predictors. A user-friendly web-server for the new predictor has been established at, where users can easily get their desired results. We anticipate that the web-server predictor will become a very useful high-throughput tool for basic research and drug development and that the novel approach and technique can also be used to investigate many other DNA-related problems and genome analysis.





