SMOTE

SMOTE
  • 文章类型: Journal Article
    睡眠是人类健康的重要生理过程,准确检测各种睡眠状态对于诊断睡眠障碍至关重要。这项研究提出了一种使用EEG信号识别睡眠阶段的新算法,比最先进的方法更有效和准确。关键创新在于在时域中采用称为Halfwave方法的分段线性数据缩减技术。该方法将EEG信号简化为具有降低的复杂性的分段线性形式,同时保留睡眠阶段特征。然后,使用从简化的分段线性函数获得的参数构建具有六个统计特征的特征向量。我们使用MIT-BIH多导睡眠数据库来测试我们提出的方法,其中包括来自不同生物医学信号的超过80小时的长数据,具有六个主要的睡眠类别。我们使用了不同的分类器,发现K最近邻分类器在我们提出的方法中表现更好。根据实验结果,平均灵敏度,特异性,所提出的算法在多导睡眠数据库上考虑8条记录的准确率估计为94.82%,96.65%,和95.73%,分别。此外,该算法在计算效率上显示出了希望,使其适用于实时应用,如睡眠监测设备。它在各种睡眠类别中的强劲表现表明它有可能被广泛的临床采用,在知识方面取得重大进展,检测,和睡眠问题的管理。
    Sleep is a vital physiological process for human health, and accurately detecting various sleep states is crucial for diagnosing sleep disorders. This study presents a novel algorithm for identifying sleep stages using EEG signals, which is more efficient and accurate than the state-of-the-art methods. The key innovation lies in employing a piecewise linear data reduction technique called the Halfwave method in the time domain. This method simplifies EEG signals into a piecewise linear form with reduced complexity while preserving sleep stage characteristics. Then, a features vector with six statistical features is built using parameters obtained from the reduced piecewise linear function. We used the MIT-BIH Polysomnographic Database to test our proposed method, which includes more than 80 h of long data from different biomedical signals with six main sleep classes. We used different classifiers and found that the K-Nearest Neighbor classifier performs better in our proposed method. According to experimental findings, the average sensitivity, specificity, and accuracy of the proposed algorithm on the Polysomnographic Database considering eight records is estimated as 94.82%, 96.65%, and 95.73%, respectively. Furthermore, the algorithm shows promise in its computational efficiency, making it suitable for real-time applications such as sleep monitoring devices. Its robust performance across various sleep classes suggests its potential for widespread clinical adoption, making significant advances in the knowledge, detection, and management of sleep problems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    医学成像是诊断各种疾病的关键组成部分,传统的方法通常依赖于人工解释和传统的机器学习技术。这些方法,虽然有效,具有固有的局限性,例如解释的主观性和处理复杂图像特征的约束。本研究论文提出了一种集成的深度学习方法,利用预训练模型-VGG16,ResNet50和InceptionV3-在一个统一的框架内组合,以提高医学成像的诊断准确性。该方法侧重于使用调整大小并转换为统一格式的图像进行肺癌检测,以优化性能并确保数据集的一致性。我们提出的模型利用了每个预训练网络的优势,通过冻结早期的卷积层并微调较深层,实现高度的特征提取和鲁棒性。此外,应用SMOTE和高斯模糊等技术来解决类不平衡问题,加强对代表性不足的班级的模型训练。在IQ-OTH/NCCD肺癌数据集上验证了模型的性能,这是在2019年秋季从伊拉克肿瘤学教学医院/国家癌症疾病中心收集的。所提出的模型达到了98.18%的精度,所有班级的准确率和召回率都很高。这一改进凸显了集成深度学习系统在医疗诊断中的潜力,提供更准确的,可靠,和有效的疾病检测手段。
    Medical imaging stands as a critical component in diagnosing various diseases, where traditional methods often rely on manual interpretation and conventional machine learning techniques. These approaches, while effective, come with inherent limitations such as subjectivity in interpretation and constraints in handling complex image features. This research paper proposes an integrated deep learning approach utilizing pre-trained models-VGG16, ResNet50, and InceptionV3-combined within a unified framework to improve diagnostic accuracy in medical imaging. The method focuses on lung cancer detection using images resized and converted to a uniform format to optimize performance and ensure consistency across datasets. Our proposed model leverages the strengths of each pre-trained network, achieving a high degree of feature extraction and robustness by freezing the early convolutional layers and fine-tuning the deeper layers. Additionally, techniques like SMOTE and Gaussian Blur are applied to address class imbalance, enhancing model training on underrepresented classes. The model\'s performance was validated on the IQ-OTH/NCCD lung cancer dataset, which was collected from the Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases over a period of three months in fall 2019. The proposed model achieved an accuracy of 98.18%, with precision and recall rates notably high across all classes. This improvement highlights the potential of integrated deep learning systems in medical diagnostics, providing a more accurate, reliable, and efficient means of disease detection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    压力是由身体对挑战性情况的反应引起的心理状况,如果长时间经历,会对身心健康产生负面影响。早期发现压力对于预防慢性健康问题至关重要。可穿戴传感器由于其非侵入性和监测生命体征的能力,为连续和实时的压力监测提供了有效的解决方案。例如,心率和活动。通常,大多数现有的研究都集中在受控环境中收集的数据。然而,我们的研究旨在提出一种基于机器学习的方法,用于使用可穿戴传感器检测自由生活环境中的压力。我们利用SWEET数据集,其中包括通过心电图(ECG)收集的240名受试者的数据,皮肤温度(ST),和皮肤电导(SC)。我们评估了四种机器学习模型,即,K-最近邻居(KNN),支持向量分类(SVC)决策树(DT)随机森林(RF),和XGBoost(XGB)在四个不同的设置。本研究使用SWEET数据集评估了各种机器学习模型对压力分类的性能。该分析包括两个二元分类方案(有和没有SMOTE)和两个多分类方案(有和没有SMOTE)。随机森林模型在没有SMOTE的二元分类中表现出优越的性能,准确率为98.29%,F1评分为97.89%。对于使用SMOTE的二元分类,K-最近邻居模型表现最好,准确率为95.70%,F1评分为95.70%。在没有SMOTE的三级分类中,随机森林模型再次出类拔萃,准确率为97.98%,F1评分为97.22%。对于使用SMOTE的三级分类,XGBoost表现出最高的性能,准确率和F1评分为98.98%。这些结果突出了不同模型在各种条件下的有效性,强调模型选择和预处理技术在提高分类性能方面的重要性。
    Stress is a psychological condition resulting from the body\'s response to challenging situations, which can negatively impact physical and mental health if experienced over prolonged periods. Early detection of stress is crucial to prevent chronic health problems. Wearable sensors offer an effective solution for continuous and real-time stress monitoring due to their non-intrusive nature and ability to monitor vital signs, e.g., heart rate and activity. Typically, most existing research has focused on data collected in controlled environments. Yet, our study aims to propose a machine learning-based approach for detecting stress in a free-living environment using wearable sensors. We utilized the SWEET dataset, which includes data from 240 subjects collected via electrocardiography (ECG), skin temperature (ST), and skin conductance (SC). We assessed four machine learning models, i.e., K-Nearest Neighbors (KNN), Support Vector Classification (SVC), Decision Tree (DT), Random Forest (RF), and XGBoost (XGB) in four different settings. This study evaluates the performance of various machine learning models for stress classification using the SWEET dataset. The analysis included two binary classification scenarios (with and without SMOTE) and two multi-class classification scenarios (with and without SMOTE). The Random Forest model demonstrated superior performance in the binary classification without SMOTE, achieving an accuracy of 98.29 % and an F1-score of 97.89 %. For binary classification with SMOTE, the K-Nearest Neighbors model performed best, with an accuracy of 95.70 % and an F1-score of 95.70 %. In the three-level classification without SMOTE, the Random Forest model again excelled, achieving an accuracy of 97.98 % and an F1-score of 97.22 %. For three-level classification with SMOTE, XGBoost showed the highest performance, with an accuracy and F1-score of 98.98 %. These results highlight the effectiveness of different models under various conditions, emphasizing the importance of model selection and preprocessing techniques in enhancing classification performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    本研究提出了一种用于个人认证的小型一维卷积神经网络(1D-CNN)框架,考虑假设单个心跳作为输入足以创建一个强大的系统。选择心电图(ECG)信号的R到R之间的一小段,以通过实施刚性长度阈值化程序结合插值技术来生成单个心跳样本。此外,我们探讨了合成少数群体过采样技术(SMOTE)对解决个体间样本分布不平衡的益处.所提出的框架进行了单独评估,并在四个公共数据库的混合中进行了评估:MIT-BIH正常窦性节律(NSRDB),MIT-BIH心律失常(MIT-ARR),ECG-ID,和MIMIC-III,它们在Physionet存储库中可用。所提出的框架表现出优异的性能,在所有指标上实现完美得分(100%)(即,准确度,精度,灵敏度,和F1分数)在单个NSRDB和MIT-ARR数据库上。同时,性能仍然很高,在包含更大群体和更多样化条件的混合数据集上达到99.6%以上。在小型和大型主题组中展示的令人印象深刻的性能强调了模型的可扩展性和广泛实施的潜力,特别是在及时身份验证至关重要的安全环境中。为了将来的研究,我们需要检查多模式生物识别系统的合并,并将框架的适用性扩展到实时环境和更大的人群。
    This study proposes a small one-dimensional convolutional neural network (1D-CNN) framework for individual authentication, considering the hypothesis that a single heartbeat as input is sufficient to create a robust system. A short segment between R to R of electrocardiogram (ECG) signals was chosen to generate single heartbeat samples by enforcing a rigid length thresholding procedure combined with an interpolation technique. Additionally, we explored the benefits of the synthetic minority oversampling technique (SMOTE) to tackle the imbalance in sample distribution among individuals. The proposed framework was evaluated individually and in a mixture of four public databases: MIT-BIH Normal Sinus Rhythm (NSRDB), MIT-BIH Arrhythmia (MIT-ARR), ECG-ID, and MIMIC-III which are available in the Physionet repository. The proposed framework demonstrated excellent performance, achieving a perfect score (100%) across all metrics (i.e., accuracy, precision, sensitivity, and F1-score) on individual NSRDB and MIT-ARR databases. Meanwhile, the performance remained high, reaching more than 99.6% on mixed datasets that contain larger populations and more diverse conditions. The impressive performance demonstrated in both small and large subject groups emphasizes the model\'s scalability and potential for widespread implementation, particularly in security contexts where timely authentication is crucial. For future research, we need to examine the incorporation of multimodal biometric systems and extend the applicability of the framework to real-time environments and larger populations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    血肿扩大(HE)是一种高风险症状,在重大事故或疾病后发生自发性脑出血(ICH)的患者发生率高。提前正确预测HE的发生对于帮助医生确定下一步的医疗治疗至关重要。大多数现有研究仅关注ICH发生后6小时内HE的发生,而实际上,相当多的患者在最初的6小时后但在24小时内患有HE。在这项研究中,根据医生的建议,我们专注于预测24小时内HE的发生,以及24h内每6h发生一次HE。基于人口统计学和计算机断层扫描(CT)图像提取信息,我们使用XGBoost方法预测24小时内HE的发生。在这项研究中,为了解决数据集高度不平衡的问题,这是医疗数据分析中常见的情况,我们使用SMOTE算法进行数据增强。为了评估我们的方法,我们使用了由582名患者记录组成的数据集,并比较了所提出方法和少数机器学习方法的结果。我们的实验表明,XGBoost在SMOTE算法处理的平衡数据集上取得了最好的预测性能,准确率为0.82,F1分数为0.82。此外,我们提出的方法以0.89、0.82、0.87和0.94的准确度预测6、12、18和24h内HE的发生,表明该方法可以准确预测24h内HE的发生。
    Hematoma expansion (HE) is a high risky symptom with high rate of occurrence for patients who have undergone spontaneous intracerebral hemorrhage (ICH) after a major accident or illness. Correct prediction of the occurrence of HE in advance is critical to help the doctors to determine the next step medical treatment. Most existing studies focus only on the occurrence of HE within 6 h after the occurrence of ICH, while in reality a considerable number of patients have HE after the first 6 h but within 24 h. In this study, based on the medical doctors recommendation, we focus on prediction of the occurrence of HE within 24 h, as well as the occurrence of HE every 6 h within 24 h. Based on the demographics and computer tomography (CT) image extraction information, we used the XGBoost method to predict the occurrence of HE within 24 h. In this study, to solve the issue of highly imbalanced data set, which is a frequent case in medical data analysis, we used the SMOTE algorithm for data augmentation. To evaluate our method, we used a data set consisting of 582 patients records, and compared the results of proposed method as well as few machine learning methods. Our experiments show that XGBoost achieved the best prediction performance on the balanced dataset processed by the SMOTE algorithm with an accuracy of 0.82 and F1-score of 0.82. Moreover, our proposed method predicts the occurrence of HE within 6, 12, 18 and 24 h at the accuracy of 0.89, 0.82, 0.87 and 0.94, indicating that the HE occurrence within 24 h can be predicted accurately by the proposed method.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    帕金森病(PD)是一种常见的神经系统疾病,其特征是运动和认知障碍,通常表现在50岁左右,并出现步态困难和言语障碍等症状。虽然治愈仍然难以捉摸,通过药物治疗症状是可能的。及时检测是有效管理疾病的关键。在这项研究中,我们利用机器学习(ML)和深度学习(DL)技术,特别是K最近邻(KNN)和前馈神经网络(FNN)模型,根据语音信号特征区分PD个体和健康个体。我们的数据集,来自加州大学欧文分校(UCI),包括从31名患者收集的195个录音。为了优化模型性能,我们采用各种策略,包括合成少数民族过采样技术(SMOTE)来解决班级不平衡问题,特征选择以识别最相关的特征,和使用RandomizedSearchCV的超参数调整。我们的实验表明,FNN和KSVM模型,在80-20分割的数据集上分别进行训练和测试,产生最有希望的结果。FNN模型实现了令人印象深刻的99.11%的整体精度,98.78%的召回率,99.96%精度,和99.23%的F1得分。同样,KSVM模型表现出强大的性能,总体准确率为95.89%,召回96.88%,精密度为98.71%,f1评分为97.62%。总的来说,我们的研究展示了ML和DL技术在从语音信号中准确识别PD方面的功效,强调这些方法对帕金森病的早期诊断和干预策略有重要贡献。
    Parkinson\'s Disease (PD) is a prevalent neurological condition characterized by motor and cognitive impairments, typically manifesting around the age of 50 and presenting symptoms such as gait difficulties and speech impairments. Although a cure remains elusive, symptom management through medication is possible. Timely detection is pivotal for effective disease management. In this study, we leverage Machine Learning (ML) and Deep Learning (DL) techniques, specifically K-Nearest Neighbor (KNN) and Feed-forward Neural Network (FNN) models, to differentiate between individuals with PD and healthy individuals based on voice signal characteristics. Our dataset, sourced from the University of California at Irvine (UCI), comprises 195 voice recordings collected from 31 patients. To optimize model performance, we employ various strategies including Synthetic Minority Over-sampling Technique (SMOTE) for addressing class imbalance, Feature Selection to identify the most relevant features, and hyperparameter tuning using RandomizedSearchCV. Our experimentation reveals that the FNN and KSVM models, trained on an 80-20 split of the dataset for training and testing respectively, yield the most promising results. The FNN model achieves an impressive overall accuracy of 99.11%, with 98.78% recall, 99.96% precision, and a 99.23% f1-score. Similarly, the KSVM model demonstrates strong performance with an overall accuracy of 95.89%, recall of 96.88%, precision of 98.71%, and an f1-score of 97.62%. Overall, our study showcases the efficacy of ML and DL techniques in accurately identifying PD from voice signals, underscoring the potential for these approaches to contribute significantly to early diagnosis and intervention strategies for Parkinson\'s Disease.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在当代社会,抑郁症已成为一种突出的精神障碍,表现出指数增长,并对过早死亡产生重大影响。尽管许多研究应用机器学习方法来预测抑郁症的迹象。然而,只有有限数量的研究将严重性级别作为多类变量考虑在内.此外,在实际社区中,保持所有类之间数据分布的平等很少发生。所以,多个变量不可避免的类不平衡被认为是该领域的重大挑战。此外,这项研究强调了在多班级背景下解决班级不平衡问题的重要性。我们在数据预处理阶段引入了一种新的特征组划分(FGP)方法,该方法有效地将特征的维度降至最低。这项研究利用了合成过采样技术,特别是合成少数过采样技术(SMOTE)和自适应合成(ADASYN),类平衡。本研究中使用的数据集是通过管理烧伤抑郁症清单(BDC)从大学生那里收集的。对于方法上的修改,我们实现了异构集成学习堆叠,均匀合奏装袋,和五种不同的监督机器学习算法。通过评估训练的准确性,缓解了过拟合的问题,验证,和测试数据集。为了证明预测模型的有效性,平衡精度,灵敏度,特异性,精度,并使用f1分数指数。总的来说,综合分析证明了传统抑郁症筛查(CDS)和FGP方法之间的区别。总之,结果表明,采用SMOTE方法的FGP堆叠分类器具有最高的平衡精度,率92.81%。经验证据表明,FGP方法,当与SMOTE结合时,能够在预测抑郁症的严重程度方面产生更好的表现。最重要的是,优化所有分类器的FGP方法的训练时间是本研究的一项重大成就。
    In contemporary society, depression has emerged as a prominent mental disorder that exhibits exponential growth and exerts a substantial influence on premature mortality. Although numerous research applied machine learning methods to forecast signs of depression. Nevertheless, only a limited number of research have taken into account the severity level as a multiclass variable. Besides, maintaining the equality of data distribution among all the classes rarely happens in practical communities. So, the inevitable class imbalance for multiple variables is considered a substantial challenge in this domain. Furthermore, this research emphasizes the significance of addressing class imbalance issues in the context of multiple classes. We introduced a new approach Feature group partitioning (FGP) in the data preprocessing phase which effectively reduces the dimensionality of features to a minimum. This study utilized synthetic oversampling techniques, specifically Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN), for class balancing. The dataset used in this research was collected from university students by administering the Burn Depression Checklist (BDC). For methodological modifications, we implemented heterogeneous ensemble learning stacking, homogeneous ensemble bagging, and five distinct supervised machine learning algorithms. The issue of overfitting was mitigated by evaluating the accuracy of the training, validation, and testing datasets. To justify the effectiveness of the prediction models, balanced accuracy, sensitivity, specificity, precision, and f1-score indices are used. Overall, comprehensive analysis demonstrates the discrimination between the Conventional Depression Screening (CDS) and FGP approach. In summary, the results show that the stacking classifier for FGP with SMOTE approach yields the highest balanced accuracy, with a rate of 92.81%. The empirical evidence has demonstrated that the FGP approach, when combined with the SMOTE, able to produce better performance in predicting the severity of depression. Most importantly the optimization of the training time of the FGP approach for all of the classifiers is a significant achievement of this research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究评估了高光谱数据检测小麦中黄色和棕色锈病的功效,采用机器学习模型和SMOTE(合成少数过采样技术)增强技术来解决不平衡的数据集。人工神经网络(ANN),支持向量机(SVM)随机森林(RF),和高斯朴素贝叶斯(GNB)模型进行了评估。总的来说,SVM和RF模型显示出更高的准确性,特别是在利用SMOTE增强数据集时。RF模型在不更改数据的情况下检测黄锈的准确率为70%。相反,棕色的铁锈,SVM模型优于其他模型,在SMOTE应用于训练集的情况下,准确率达到63%。这项研究强调了光谱数据和机器学习(ML)技术在植物病害检测中的潜力。它强调需要进一步研究数据处理方法,特别是在探索SMOTE等技术对模型性能的影响方面。
    This study evaluates the efficacy of hyperspectral data for detecting yellow and brown rust in wheat, employing machine learning models and the SMOTE (Synthetic Minority Oversampling Technique) augmentation technique to tackle unbalanced datasets. Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forest (RF), and Gaussian Naïve Bayes (GNB) models were assessed. Overall, SVM and RF models showed higher accuracies, particularly when utilizing SMOTE-enhanced datasets. The RF model achieved 70% accuracy in detecting yellow rust without data alteration. Conversely, for brown rust, the SVM model outperformed others, reaching 63% accuracy with SMOTE applied to the training set. This study highlights the potential of spectral data and machine learning (ML) techniques in plant disease detection. It emphasizes the need for further research in data processing methodologies, particularly in exploring the impact of techniques like SMOTE on model performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    肺癌仍然是全球癌症相关死亡的主要原因。预后显著依赖于早期检测。传统的诊断方法,虽然有效,经常面临准确性方面的挑战,早期发现,和可扩展性,是侵入性的,耗时,容易产生模棱两可的解释。这项研究提出了一种先进的机器学习模型,旨在使用CT扫描图像增强肺癌分期分类。旨在通过提供更快的,非侵入性,可靠的诊断工具。利用IQ-OTHNCCD肺癌数据集,包括不同阶段肺癌和健康个体的CT扫描,我们进行了广泛的预处理,包括调整大小,归一化,和高斯模糊。然后在这些预处理的数据上训练卷积神经网络(CNN),使用合成少数民族过采样技术(SMOTE)解决了班级不平衡问题。通过准确性、精度,召回,F1分数,和ROC曲线分析。结果显示分类准确率为99.64%,精确地,召回,在所有类别中,F1得分超过98%。SMOTE显著增强了模型对代表性不足的类进行分类的能力,有助于诊断工具的鲁棒性。这些发现强调了机器学习在改变肺癌诊断方面的潜力。在阶段分类中提供高精度,这可以促进早期发现和量身定制的治疗策略,最终改善患者预后。
    Lung cancer remains a leading cause of cancer-related mortality globally, with prognosis significantly dependent on early-stage detection. Traditional diagnostic methods, though effective, often face challenges regarding accuracy, early detection, and scalability, being invasive, time-consuming, and prone to ambiguous interpretations. This study proposes an advanced machine learning model designed to enhance lung cancer stage classification using CT scan images, aiming to overcome these limitations by offering a faster, non-invasive, and reliable diagnostic tool. Utilizing the IQ-OTHNCCD lung cancer dataset, comprising CT scans from various stages of lung cancer and healthy individuals, we performed extensive preprocessing including resizing, normalization, and Gaussian blurring. A Convolutional Neural Network (CNN) was then trained on this preprocessed data, and class imbalance was addressed using Synthetic Minority Over-sampling Technique (SMOTE). The model\'s performance was evaluated through metrics such as accuracy, precision, recall, F1-score, and ROC curve analysis. The results demonstrated a classification accuracy of 99.64%, with precision, recall, and F1-score values exceeding 98% across all categories. SMOTE significantly enhanced the model\'s ability to classify underrepresented classes, contributing to the robustness of the diagnostic tool. These findings underscore the potential of machine learning in transforming lung cancer diagnostics, providing high accuracy in stage classification, which could facilitate early detection and tailored treatment strategies, ultimately improving patient outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究使用人工智能技术来识别复发胃癌幸存者的临床癌症生物标志物。从台湾医院的癌症登记数据库中,在2476名胃癌幸存者中纳入了复发率和临床风险特征的数据集.我们使用MLP对随机森林进行基准测试,C4.5,AdaBoost,和度量上的Bagging算法,并利用合成少数过采样技术(SMOTE)来解决不平衡的数据集问题,用于风险评估的成本敏感学习,和沙普利加性扩张(SHAPs)进行特征重要性分析。我们提出的随机森林优于其他模型,准确率为87.9%,召回率为90.5%,准确率为86%,通过平衡数据集中的10倍交叉验证,循环类别的F1为88.2%。我们确定了复发性胃癌的临床特征,这是五大特征,舞台,区域淋巴结受累的数量,幽门螺杆菌,BMI(体重指数),和性别;这些特征显著影响预测模型的输出,在以下因果效应分析中值得关注。使用人工智能模型,可以确定复发胃癌的危险因素,并根据其特征重要性进行经济有效的排序.此外,它们应该是关键的临床特征,为医生提供筛查胃癌幸存者高危患者的知识。
    This study used artificial intelligence techniques to identify clinical cancer biomarkers for recurrent gastric cancer survivors. From a hospital-based cancer registry database in Taiwan, the datasets of the incidence of recurrence and clinical risk features were included in 2476 gastric cancer survivors. We benchmarked Random Forest using MLP, C4.5, AdaBoost, and Bagging algorithms on metrics and leveraged the synthetic minority oversampling technique (SMOTE) for imbalanced dataset issues, cost-sensitive learning for risk assessment, and SHapley Additive exPlanations (SHAPs) for feature importance analysis in this study. Our proposed Random Forest outperformed the other models with an accuracy of 87.9%, a recall rate of 90.5%, an accuracy rate of 86%, and an F1 of 88.2% on the recurrent category by a 10-fold cross-validation in a balanced dataset. We identified clinical features of recurrent gastric cancer, which are the top five features, stage, number of regional lymph node involvement, Helicobacter pylori, BMI (body mass index), and gender; these features significantly affect the prediction model\'s output and are worth paying attention to in the following causal effect analysis. Using an artificial intelligence model, the risk factors for recurrent gastric cancer could be identified and cost-effectively ranked according to their feature importance. In addition, they should be crucial clinical features to provide physicians with the knowledge to screen high-risk patients in gastric cancer survivors as well.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号