• 文章类型: Journal Article
    Sleep is a vital physiological process for human health, and accurately detecting various sleep states is crucial for diagnosing sleep disorders. This study presents a novel algorithm for identifying sleep stages using EEG signals, which is more efficient and accurate than the state-of-the-art methods. The key innovation lies in employing a piecewise linear data reduction technique called the Halfwave method in the time domain. This method simplifies EEG signals into a piecewise linear form with reduced complexity while preserving sleep stage characteristics. Then, a features vector with six statistical features is built using parameters obtained from the reduced piecewise linear function. We used the MIT-BIH Polysomnographic Database to test our proposed method, which includes more than 80 h of long data from different biomedical signals with six main sleep classes. We used different classifiers and found that the K-Nearest Neighbor classifier performs better in our proposed method. According to experimental findings, the average sensitivity, specificity, and accuracy of the proposed algorithm on the Polysomnographic Database considering eight records is estimated as 94.82%, 96.65%, and 95.73%, respectively. Furthermore, the algorithm shows promise in its computational efficiency, making it suitable for real-time applications such as sleep monitoring devices. Its robust performance across various sleep classes suggests its potential for widespread clinical adoption, making significant advances in the knowledge, detection, and management of sleep problems.






  • 文章类型: Journal Article
    Medical imaging stands as a critical component in diagnosing various diseases, where traditional methods often rely on manual interpretation and conventional machine learning techniques. These approaches, while effective, come with inherent limitations such as subjectivity in interpretation and constraints in handling complex image features. This research paper proposes an integrated deep learning approach utilizing pre-trained models-VGG16, ResNet50, and InceptionV3-combined within a unified framework to improve diagnostic accuracy in medical imaging. The method focuses on lung cancer detection using images resized and converted to a uniform format to optimize performance and ensure consistency across datasets. Our proposed model leverages the strengths of each pre-trained network, achieving a high degree of feature extraction and robustness by freezing the early convolutional layers and fine-tuning the deeper layers. Additionally, techniques like SMOTE and Gaussian Blur are applied to address class imbalance, enhancing model training on underrepresented classes. The model\'s performance was validated on the IQ-OTH/NCCD lung cancer dataset, which was collected from the Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases over a period of three months in fall 2019. The proposed model achieved an accuracy of 98.18%, with precision and recall rates notably high across all classes. This improvement highlights the potential of integrated deep learning systems in medical diagnostics, providing a more accurate, reliable, and efficient means of disease detection.






  • 文章类型: Journal Article
    Stress is a psychological condition resulting from the body\'s response to challenging situations, which can negatively impact physical and mental health if experienced over prolonged periods. Early detection of stress is crucial to prevent chronic health problems. Wearable sensors offer an effective solution for continuous and real-time stress monitoring due to their non-intrusive nature and ability to monitor vital signs, e.g., heart rate and activity. Typically, most existing research has focused on data collected in controlled environments. Yet, our study aims to propose a machine learning-based approach for detecting stress in a free-living environment using wearable sensors. We utilized the SWEET dataset, which includes data from 240 subjects collected via electrocardiography (ECG), skin temperature (ST), and skin conductance (SC). We assessed four machine learning models, i.e., K-Nearest Neighbors (KNN), Support Vector Classification (SVC), Decision Tree (DT), Random Forest (RF), and XGBoost (XGB) in four different settings. This study evaluates the performance of various machine learning models for stress classification using the SWEET dataset. The analysis included two binary classification scenarios (with and without SMOTE) and two multi-class classification scenarios (with and without SMOTE). The Random Forest model demonstrated superior performance in the binary classification without SMOTE, achieving an accuracy of 98.29 % and an F1-score of 97.89 %. For binary classification with SMOTE, the K-Nearest Neighbors model performed best, with an accuracy of 95.70 % and an F1-score of 95.70 %. In the three-level classification without SMOTE, the Random Forest model again excelled, achieving an accuracy of 97.98 % and an F1-score of 97.22 %. For three-level classification with SMOTE, XGBoost showed the highest performance, with an accuracy and F1-score of 98.98 %. These results highlight the effectiveness of different models under various conditions, emphasizing the importance of model selection and preprocessing techniques in enhancing classification performance.






  • 文章类型: Journal Article
    This study proposes a small one-dimensional convolutional neural network (1D-CNN) framework for individual authentication, considering the hypothesis that a single heartbeat as input is sufficient to create a robust system. A short segment between R to R of electrocardiogram (ECG) signals was chosen to generate single heartbeat samples by enforcing a rigid length thresholding procedure combined with an interpolation technique. Additionally, we explored the benefits of the synthetic minority oversampling technique (SMOTE) to tackle the imbalance in sample distribution among individuals. The proposed framework was evaluated individually and in a mixture of four public databases: MIT-BIH Normal Sinus Rhythm (NSRDB), MIT-BIH Arrhythmia (MIT-ARR), ECG-ID, and MIMIC-III which are available in the Physionet repository. The proposed framework demonstrated excellent performance, achieving a perfect score (100%) across all metrics (i.e., accuracy, precision, sensitivity, and F1-score) on individual NSRDB and MIT-ARR databases. Meanwhile, the performance remained high, reaching more than 99.6% on mixed datasets that contain larger populations and more diverse conditions. The impressive performance demonstrated in both small and large subject groups emphasizes the model\'s scalability and potential for widespread implementation, particularly in security contexts where timely authentication is crucial. For future research, we need to examine the incorporation of multimodal biometric systems and extend the applicability of the framework to real-time environments and larger populations.






  • 文章类型: Journal Article
    Hematoma expansion (HE) is a high risky symptom with high rate of occurrence for patients who have undergone spontaneous intracerebral hemorrhage (ICH) after a major accident or illness. Correct prediction of the occurrence of HE in advance is critical to help the doctors to determine the next step medical treatment. Most existing studies focus only on the occurrence of HE within 6 h after the occurrence of ICH, while in reality a considerable number of patients have HE after the first 6 h but within 24 h. In this study, based on the medical doctors recommendation, we focus on prediction of the occurrence of HE within 24 h, as well as the occurrence of HE every 6 h within 24 h. Based on the demographics and computer tomography (CT) image extraction information, we used the XGBoost method to predict the occurrence of HE within 24 h. In this study, to solve the issue of highly imbalanced data set, which is a frequent case in medical data analysis, we used the SMOTE algorithm for data augmentation. To evaluate our method, we used a data set consisting of 582 patients records, and compared the results of proposed method as well as few machine learning methods. Our experiments show that XGBoost achieved the best prediction performance on the balanced dataset processed by the SMOTE algorithm with an accuracy of 0.82 and F1-score of 0.82. Moreover, our proposed method predicts the occurrence of HE within 6, 12, 18 and 24 h at the accuracy of 0.89, 0.82, 0.87 and 0.94, indicating that the HE occurrence within 24 h can be predicted accurately by the proposed method.






  • 文章类型: Journal Article
    Parkinson\'s Disease (PD) is a prevalent neurological condition characterized by motor and cognitive impairments, typically manifesting around the age of 50 and presenting symptoms such as gait difficulties and speech impairments. Although a cure remains elusive, symptom management through medication is possible. Timely detection is pivotal for effective disease management. In this study, we leverage Machine Learning (ML) and Deep Learning (DL) techniques, specifically K-Nearest Neighbor (KNN) and Feed-forward Neural Network (FNN) models, to differentiate between individuals with PD and healthy individuals based on voice signal characteristics. Our dataset, sourced from the University of California at Irvine (UCI), comprises 195 voice recordings collected from 31 patients. To optimize model performance, we employ various strategies including Synthetic Minority Over-sampling Technique (SMOTE) for addressing class imbalance, Feature Selection to identify the most relevant features, and hyperparameter tuning using RandomizedSearchCV. Our experimentation reveals that the FNN and KSVM models, trained on an 80-20 split of the dataset for training and testing respectively, yield the most promising results. The FNN model achieves an impressive overall accuracy of 99.11%, with 98.78% recall, 99.96% precision, and a 99.23% f1-score. Similarly, the KSVM model demonstrates strong performance with an overall accuracy of 95.89%, recall of 96.88%, precision of 98.71%, and an f1-score of 97.62%. Overall, our study showcases the efficacy of ML and DL techniques in accurately identifying PD from voice signals, underscoring the potential for these approaches to contribute significantly to early diagnosis and intervention strategies for Parkinson\'s Disease.






  • 文章类型: Journal Article
    In contemporary society, depression has emerged as a prominent mental disorder that exhibits exponential growth and exerts a substantial influence on premature mortality. Although numerous research applied machine learning methods to forecast signs of depression. Nevertheless, only a limited number of research have taken into account the severity level as a multiclass variable. Besides, maintaining the equality of data distribution among all the classes rarely happens in practical communities. So, the inevitable class imbalance for multiple variables is considered a substantial challenge in this domain. Furthermore, this research emphasizes the significance of addressing class imbalance issues in the context of multiple classes. We introduced a new approach Feature group partitioning (FGP) in the data preprocessing phase which effectively reduces the dimensionality of features to a minimum. This study utilized synthetic oversampling techniques, specifically Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN), for class balancing. The dataset used in this research was collected from university students by administering the Burn Depression Checklist (BDC). For methodological modifications, we implemented heterogeneous ensemble learning stacking, homogeneous ensemble bagging, and five distinct supervised machine learning algorithms. The issue of overfitting was mitigated by evaluating the accuracy of the training, validation, and testing datasets. To justify the effectiveness of the prediction models, balanced accuracy, sensitivity, specificity, precision, and f1-score indices are used. Overall, comprehensive analysis demonstrates the discrimination between the Conventional Depression Screening (CDS) and FGP approach. In summary, the results show that the stacking classifier for FGP with SMOTE approach yields the highest balanced accuracy, with a rate of 92.81%. The empirical evidence has demonstrated that the FGP approach, when combined with the SMOTE, able to produce better performance in predicting the severity of depression. Most importantly the optimization of the training time of the FGP approach for all of the classifiers is a significant achievement of this research.






  • 文章类型: Journal Article
    This study evaluates the efficacy of hyperspectral data for detecting yellow and brown rust in wheat, employing machine learning models and the SMOTE (Synthetic Minority Oversampling Technique) augmentation technique to tackle unbalanced datasets. Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forest (RF), and Gaussian Naïve Bayes (GNB) models were assessed. Overall, SVM and RF models showed higher accuracies, particularly when utilizing SMOTE-enhanced datasets. The RF model achieved 70% accuracy in detecting yellow rust without data alteration. Conversely, for brown rust, the SVM model outperformed others, reaching 63% accuracy with SMOTE applied to the training set. This study highlights the potential of spectral data and machine learning (ML) techniques in plant disease detection. It emphasizes the need for further research in data processing methodologies, particularly in exploring the impact of techniques like SMOTE on model performance.






  • 文章类型: Journal Article
    Lung cancer remains a leading cause of cancer-related mortality globally, with prognosis significantly dependent on early-stage detection. Traditional diagnostic methods, though effective, often face challenges regarding accuracy, early detection, and scalability, being invasive, time-consuming, and prone to ambiguous interpretations. This study proposes an advanced machine learning model designed to enhance lung cancer stage classification using CT scan images, aiming to overcome these limitations by offering a faster, non-invasive, and reliable diagnostic tool. Utilizing the IQ-OTHNCCD lung cancer dataset, comprising CT scans from various stages of lung cancer and healthy individuals, we performed extensive preprocessing including resizing, normalization, and Gaussian blurring. A Convolutional Neural Network (CNN) was then trained on this preprocessed data, and class imbalance was addressed using Synthetic Minority Over-sampling Technique (SMOTE). The model\'s performance was evaluated through metrics such as accuracy, precision, recall, F1-score, and ROC curve analysis. The results demonstrated a classification accuracy of 99.64%, with precision, recall, and F1-score values exceeding 98% across all categories. SMOTE significantly enhanced the model\'s ability to classify underrepresented classes, contributing to the robustness of the diagnostic tool. These findings underscore the potential of machine learning in transforming lung cancer diagnostics, providing high accuracy in stage classification, which could facilitate early detection and tailored treatment strategies, ultimately improving patient outcomes.






  • 文章类型: Journal Article
    This study used artificial intelligence techniques to identify clinical cancer biomarkers for recurrent gastric cancer survivors. From a hospital-based cancer registry database in Taiwan, the datasets of the incidence of recurrence and clinical risk features were included in 2476 gastric cancer survivors. We benchmarked Random Forest using MLP, C4.5, AdaBoost, and Bagging algorithms on metrics and leveraged the synthetic minority oversampling technique (SMOTE) for imbalanced dataset issues, cost-sensitive learning for risk assessment, and SHapley Additive exPlanations (SHAPs) for feature importance analysis in this study. Our proposed Random Forest outperformed the other models with an accuracy of 87.9%, a recall rate of 90.5%, an accuracy rate of 86%, and an F1 of 88.2% on the recurrent category by a 10-fold cross-validation in a balanced dataset. We identified clinical features of recurrent gastric cancer, which are the top five features, stage, number of regional lymph node involvement, Helicobacter pylori, BMI (body mass index), and gender; these features significantly affect the prediction model\'s output and are worth paying attention to in the following causal effect analysis. Using an artificial intelligence model, the risk factors for recurrent gastric cancer could be identified and cost-effectively ranked according to their feature importance. In addition, they should be crucial clinical features to provide physicians with the knowledge to screen high-risk patients in gastric cancer survivors as well.





