feature selection

功能选择
  • 文章类型: Journal Article
    原生动物病原体构成相当大的威胁,导致显著的死亡率和不断发展的耐药性的挑战。这种情况强调了对替代治疗方法的迫切需要。抗微生物肽是有希望的药物开发候选物。然而,缺乏专门针对原生动物病原体的抗菌肽的预测研究。在这项研究中,我们引入了一个成功的基于机器学习的框架,旨在预测潜在的抗原生动物肽对原生动物病原体有效。
    这项研究的主要目的是使用不同的阴性数据集对抗原虫肽进行分类和预测。
    进行了全面的文献综述,以收集实验验证的抗原虫肽,形成我们研究的阳性数据集。为了构建一个健壮的机器学习分类器,合并了多个负面数据集,包括(i)非抗菌药物,(ii)抗病毒,(iii)抗菌,(iv)抗真菌药,和(v)抗微生物肽,不包括靶向原生动物病原体的抗微生物肽。使用pfeature算法提取肽的各种组成特征。两种特征选择方法,SVC-L1和mRMR,用于识别高度相关的特征,这些特征对于区分正面和负面数据集至关重要。此外,五种流行的分类器,即决策树,随机森林,支持向量机,Logistic回归,和XGBoost用于构建有效的决策模型。
    XGBoost在根据mRMR特征选择方法选择的特征对每个阴性数据集中的抗原虫肽进行分类方面最有效。提出的机器学习框架有效区分抗原虫肽(i)非抗菌(ii)抗病毒(iii)抗菌(iv)抗真菌和(v)抗菌,准确率为97.27%,93.64%,86.36%,90.91%,在验证数据集上分别为89.09%。
    这些模型被合并到用户友好的Web服务器中(www.soodlab.com/appred)来预测给定肽的抗原生动物活性。
    UNASSIGNED: Protozoal pathogens pose a considerable threat, leading to notable mortality rates and the ongoing challenge of developing resistance to drugs. This situation underscores the urgent need for alternative therapeutic approaches. Antimicrobial peptides stand out as promising candidates for drug development. However, there is a lack of published research focusing on predicting antimicrobial peptides specifically targeting protozoal pathogens. In this study, we introduce a successful machine learning-based framework designed to predict potential antiprotozoal peptides effective against protozoal pathogens.
    UNASSIGNED: The primary objective of this study is to classify and predict antiprotozoal peptides using diverse negative datasets.
    UNASSIGNED: A comprehensive literature review was conducted to gather experimentally validated antiprotozoal peptides, forming the positive dataset for our study. To construct a robust machine learning classifier, multiple negative datasets were incorporated, including (i) non-antimicrobial, (ii) antiviral, (iii) antibacterial, (iv) antifungal, and (v) antimicrobial peptides excluding those targeting protozoal pathogens. Various compositional features of the peptides were extracted using the pfeature algorithm. Two feature selection methods, SVC-L1 and mRMR, were employed to identify highly relevant features crucial for distinguishing between the positive and negative datasets. Additionally, five popular classifiers i.e. Decision Tree, Random Forest, Support Vector Machine, Logistic Regression, and XGBoost were used to build efficient decision models.
    UNASSIGNED: XGBoost was the most effective in classifying antiprotozoal peptides from each negative dataset based on the features selected by the mRMR feature selection method. The proposed machine learning framework efficiently differentiate the antiprotozoal peptides from (i) non-antimicrobial (ii) antiviral (iii) antibacterial (iv) antifungal and (v) antimicrobial with accuracy of 97.27 %, 93.64 %, 86.36 %, 90.91 %, and 89.09 % respectively on the validation dataset.
    UNASSIGNED: The models are incorporated in a user-friendly web server (www.soodlab.com/appred) to predict the antiprotozoal activity of given peptides.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    蛋白质溶解度是决定稳定性的关键参数,活动,和蛋白质的功能,对生物技术和生物化学有着广泛而深远的影响。蛋白质溶解度的准确预测和控制对于在研究和工业环境中成功表达和纯化蛋白质至关重要。这项研究收集了可溶性和不溶性蛋白质的信息。在表征蛋白质时,它们被映射到STRING,并具有功能和结构特征。整合所有功能/结构特征以产生5768维二元载体来编码蛋白质。采用七种特征排序算法分析功能/结构特征,产生七个功能列表。这些列表进行了增量特征选择,结合四种分类算法,逐一建立有效的分类模型,识别具有分类相关重要性的功能/结构特征。确定了一些用于区分可溶性和不溶性蛋白质的基本功能/结构特征。包括GO:0009987(细胞间通讯)和GO:0022613(核糖核蛋白复合物生物发生)。使用支持向量机作为分类算法和295个优化的功能/结构特征的最佳分类模型产生了0.825的F1得分,这可以成为区分可溶性蛋白质和不溶性蛋白质的强大工具。
    Protein solubility is a critical parameter that determines the stability, activity, and functionality of proteins, with broad and far-reaching implications in biotechnology and biochemistry. Accurate prediction and control of protein solubility are essential for successful protein expression and purification in research and industrial settings. This study gathered information on soluble and insoluble proteins. In characterizing the proteins, they were mapped to STRING and characterized by functional and structural features. All functional/structural features were integrated to create a 5768-dimensional binary vector to encode proteins. Seven feature-ranking algorithms were employed to analyze the functional/structural features, yielding seven feature lists. These lists were subjected to the incremental feature selection, incorporating four classification algorithms, one by one to build effective classification models and identify functional/structural features with classification-related importance. Some essential functional/structural features used to differentiate between soluble and insoluble proteins were identified, including GO:0009987 (intercellular communication) and GO:0022613 (ribonucleoprotein complex biogenesis). The best classification model using support vector machine as the classification algorithm and 295 optimized functional/structural features generated the F1 score of 0.825, which can be a powerful tool to differentiate soluble proteins from insoluble proteins.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    这项研究探索了一种通过分析脑电图(EEG)信号来检测唤醒水平的新颖方法。利用来自18名健康参与者的数据的Faller数据库,我们采用64通道脑电图系统。
    我们采用的方法需要从每个通道中提取十个频率特性,为每个信号实例计算640维的特征向量。为了提高分类准确性,我们采用遗传算法进行特征选择,将其视为多目标优化任务。该方法利用快速比特跳变来提高效率,克服传统的位串限制。混合算子加快算法收敛,和解决方案选择策略识别最合适的特征子集。
    实验结果证明了该方法在检测不同状态的唤醒水平方面的有效性,随着准确性的提高,灵敏度,和特异性。在方案一,所提出的方法达到了平均精度,灵敏度,和93.11%的特异性,98.37%,99.14%,分别。在场景2中,平均值为81.35%,88.65%,和84.64%。
    获得的结果表明,所提出的方法在不同场景中具有很高的检测唤醒水平的能力。此外,已经证明了采用所提出的特征减少方法的优点。
    UNASSIGNED: This study explores a novel approach to detecting arousal levels through the analysis of electroencephalography (EEG) signals. Leveraging the Faller database with data from 18 healthy participants, we employ a 64-channel EEG system.
    UNASSIGNED: The approach we employ entails the extraction of ten frequency characteristics from every channel, culminating in a feature vector of 640 dimensions for each signal instance. To enhance classification accuracy, we employ a genetic algorithm for feature selection, treating it as a multiobjective optimization task. The approach utilizes fast bit hopping for efficiency, overcoming traditional bit-string limitations. A hybrid operator expedites algorithm convergence, and a solution selection strategy identifies the most suitable feature subset.
    UNASSIGNED: Experimental results demonstrate the method\'s effectiveness in detecting arousal levels across diverse states, with improvements in accuracy, sensitivity, and specificity. In scenario one, the proposed method achieves an average accuracy, sensitivity, and specificity of 93.11%, 98.37%, and 99.14%, respectively. In scenario two, the averages stand at 81.35%, 88.65%, and 84.64%.
    UNASSIGNED: The obtained results indicate that the proposed method has a high capability of detecting arousal levels in different scenarios. In addition, the advantage of employing the proposed feature reduction method has been demonstrated.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    皮肤癌(SC)是一种重要的医学疾病,需要及时识别以确保及时治疗。虽然皮肤科医生的视觉评估被认为是最可靠的方法,它的功效是主观和费力的。基于深度学习的计算机辅助诊断(CAD)平台已成为支持皮肤科医生的宝贵工具。然而,当前的CAD工具通常依赖于具有大量深层和超参数的卷积神经网络(CNN),单一CNN模型方法,大的特征空间,并专门利用空间图像信息,这限制了其有效性。这项研究提出了SCaLiNG,一个创新的CAD工具专门开发解决和超越这些限制。SCaLiNG利用三个紧凑CNN和Gabor小波(GW)的集合来获取由空间纹理频率属性组成的综合特征向量。SCaLiNG通过使用GW将这些照片分解为多个定向子带来收集广泛的图像细节,然后使用这些子带和原始图片学习几个CNN。SCaLiNG还将从训练的各种CNN获取的属性与从GW导出的实际图像和子带相结合。由于属性的全面表示,该融合过程相应地提高了诊断准确性。此外,SCaLiNG应用特征选择方法,通过选择最显著的特征进一步增强模型的性能。实验结果表明,SCaLiNG在对SC子类别进行分类时保持0.9170的分类精度,超越传统的单CNN模型。SCaLiNG的出色表现突显了其帮助皮肤科医生快速准确地识别和分类SC的能力,从而提高患者的治疗效果。
    Skin cancer (SC) is an important medical condition that necessitates prompt identification to ensure timely treatment. Although visual evaluation by dermatologists is considered the most reliable method, its efficacy is subjective and laborious. Deep learning-based computer-aided diagnostic (CAD) platforms have become valuable tools for supporting dermatologists. Nevertheless, current CAD tools frequently depend on Convolutional Neural Networks (CNNs) with huge amounts of deep layers and hyperparameters, single CNN model methodologies, large feature space, and exclusively utilise spatial image information, which restricts their effectiveness. This study presents SCaLiNG, an innovative CAD tool specifically developed to address and surpass these constraints. SCaLiNG leverages a collection of three compact CNNs and Gabor Wavelets (GW) to acquire a comprehensive feature vector consisting of spatial-textural-frequency attributes. SCaLiNG gathers a wide range of image details by breaking down these photos into multiple directional sub-bands using GW, and then learning several CNNs using those sub-bands and the original picture. SCaLiNG also combines attributes taken from various CNNs trained with the actual images and subbands derived from GW. This fusion process correspondingly improves diagnostic accuracy due to the thorough representation of attributes. Furthermore, SCaLiNG applies a feature selection approach which further enhances the model\'s performance by choosing the most distinguishing features. Experimental findings indicate that SCaLiNG maintains a classification accuracy of 0.9170 in categorising SC subcategories, surpassing conventional single-CNN models. The outstanding performance of SCaLiNG underlines its ability to aid dermatologists in swiftly and precisely recognising and classifying SC, thereby enhancing patient outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    热成像是一种非侵入性和非接触的方法,用于通过检查两个乳房之间的温度变化来检测癌症的初始阶段。预处理方法,如调整大小,ROI(感兴趣区域)分割,和增加经常被用来提高乳房热谱图分析的准确性。在这项研究中,一种改进的U-Net架构(DTCVAU-Net),该架构使用双树复小波变换(DTCWT)和注意门,用于正面和侧面视图热图的乳房热图像分割,旨在勾勒出潜在肿瘤检测的ROI,被提议了。该方法平均Dice系数为93.03%,灵敏度为94.82%,展示其准确的乳房热谱图分割的潜力。通过从分割的热谱图提取基于纹理和直方图的特征和深度特征,将乳房热谱图分类为健康或癌症类别。使用邻域成分分析(NCA)进行特征选择,其次是机器学习分类器的应用。与使用热谱图检测乳腺癌的其他最先进的方法相比,提出的方法显示了一个更高的准确率99.90%VGG16深度特征与NCA和随机森林分类器。仿真结果表明,该方法可用于乳腺癌筛查,促进早期检测,提高治疗效果。
    Thermography is a non-invasive and non-contact method for detecting cancer in its initial stages by examining the temperature variation between both breasts. Preprocessing methods such as resizing, ROI (region of interest) segmentation, and augmentation are frequently used to enhance the accuracy of breast thermogram analysis. In this study, a modified U-Net architecture (DTCWAU-Net) that uses dual-tree complex wavelet transform (DTCWT) and attention gate for breast thermal image segmentation for frontal and lateral view thermograms, aiming to outline ROI for potential tumor detection, was proposed. The proposed approach achieved an average Dice coefficient of 93.03% and a sensitivity of 94.82%, showcasing its potential for accurate breast thermogram segmentation. Classification of breast thermograms into healthy or cancerous categories was carried out by extracting texture- and histogram-based features and deep features from segmented thermograms. Feature selection was performed using Neighborhood Component Analysis (NCA), followed by the application of machine learning classifiers. When compared to other state-of-the-art approaches for detecting breast cancer using a thermogram, the proposed methodology showed a higher accuracy of 99.90% for VGG16 deep features with NCA and Random Forest classifier. Simulation results expound that the proposed method can be used in breast cancer screening, facilitating early detection, and enhancing treatment outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:结直肠癌(CRC)是全球第三大流行癌症,由于其高转移率,构成了重大挑战。大约20%的CRC患者在诊断时出现远处转移。超过50%在五年内发展转移。准确预测转移对于改善CRC患者的生存结果至关重要。
    方法:本研究介绍了一种创新的成本敏感的基于快速相关的滤波器(CS-FCBF)算法,用于特征选择,结合机器学习技术预测转移性CRC。CS-FCBF算法有效地将基因组特征的数量从184个减少到9个关键基因:CXCL9,C2CD4B,RGCC,GFI1、BEX2、CXCL3、FOXQ1、PBK、PLAG1体外结合的方法学,在体内,并分析公开可用的单细胞RNA-seq数据集以验证研究结果。
    结果:CS-FCBF算法的应用导致预测模型性能的显着提高,精确召回率曲线下面积平均增加21.16%。9个鉴定的基因具有作为转移性CRC的诊断生物标志物和治疗靶标的潜力。
    结论:这项研究强调了高级特征选择方法的关键作用,结合机器学习,在应对医学诊断中的阶级不平衡的挑战时,尤其是CRC。早期发现转移至关重要,鉴定的基因强调了它们在CRC转移过程中的重要性。这里应用的方法提供了有价值的见解,并为未来面临类似诊断挑战的其他癌症或疾病的研究铺平了道路。
    OBJECTIVE: Colorectal cancer (CRC) is the third most prevalent cancer globally, posing a significant challenge due to its high rate of metastasis. Approximately 20% of patients with CRC present with distant metastases at diagnosis, and over 50% develop metastases within five years. Accurate prediction of metastasis is crucial for improving survival outcomes in patients with CRC.
    METHODS: This study introduces an innovative cost-sensitive fast correlation-based filter (CS-FCBF) algorithm for feature selection, integrated with machine learning techniques to predict metastatic CRC. The CS-FCBF algorithm effectively reduced the number of genomic features from 184 to 9 critical genes: CXCL9, C2CD4B, RGCC, GFI1, BEX2, CXCL3, FOXQ1, PBK, and PLAG1. The methodology combined in vitro, in vivo, and analysis of publicly available single-cell RNA-seq datasets to validate the findings.
    RESULTS: The application of the CS-FCBF algorithm led to a significant improvement in prediction model performance, with an average 21.16% increase in the area under the precision-recall curve. The nine identified genes hold potential as diagnostic biomarkers and therapeutic targets for metastatic CRC.
    CONCLUSIONS: This study highlights the critical role of advanced feature selection methods, combined with machine learning, in addressing the challenge of class imbalance in medical diagnosis, particularly for CRC. Early detection of metastasis is vital, and the identified genes underscore their importance in the metastatic process of CRC. The methodology applied here offers valuable insights and paves the way for future research in other cancers or diseases that face similar diagnostic challenges.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    硝基芳香化合物(NACs)是普遍存在的有机污染物,促使迫切需要调查它们的危险影响。计算化学方法在这一探索中起着至关重要的作用,提供更安全、更省时的方法,由各种立法规定。在这项研究中,我们的重点是透明的发展,可解释,可重复,和公开可用的方法,旨在得出定量的结构-活性关系模型,并通过模拟NAC对鼠伤寒沙门氏菌TA100菌株的诱变性进行测试。描述符选自Mordred和RDKit分子描述符,以及几个量子化学描述符。为此,遗传算法(GA),作为文献中使用最广泛的方法,和三种替代算法(Boruta,Featurewiz,和ForwardSelector)结合使用了正向逐步选择技术。模型的构建采用了多元线性回归方法,随着随后对拟合和预测性能的审查,可靠性,通过各种统计验证标准和稳健性。这些模型是通过多标准决策程序进行排序的。研究结果表明,所提出的描述符选择方法优于GA,Featurewiz比Boruta和ForwardSelector略有优势。这些构建的模型可以作为快速可靠地预测NAC致突变性的有价值的工具。
    Nitroaromatic compounds (NACs) stand out as pervasive organic pollutants, prompting an imperative need to investigate their hazardous effects. Computational chemistry methods play a crucial role in this exploration, offering a safer and more time-efficient approach, mandated by various legislations. In this study, our focus lay on the development of transparent, interpretable, reproducible, and publicly available methodologies aimed at deriving quantitative structure-activity relationship models and testing them by modelling the mutagenicity of NACs against the Salmonella typhimurium TA100 strain. Descriptors were selected from Mordred and RDKit molecular descriptors, along with several quantum chemistry descriptors. For that purpose, the genetic algorithm (GA), as the most widely used method in the literature, and three alternative algorithms (Boruta, Featurewiz, and ForwardSelector) combined with the forward stepwise selection technique were used. The construction of models utilized the multiple linear regression method, with subsequent scrutiny of fitting and predictive performance, reliability, and robustness through various statistical validation criteria. The models were ranked by the Multi-Criteria Decision Making procedure. Findings have revealed that the proposed methodology for descriptor selection outperforms GA, with Featurewiz showing a slight advantage over Boruta and ForwardSelector. These constructed models can serve as valuable tools for the quick and reliable prediction of NACs mutagenicity.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    随着计算机视觉和传感器技术的进步,许多多摄像机系统正在开发用于控制,规划,以及无人系统或机器人的其他功能。多相机系统的校准决定了其操作的准确性。然而,没有重叠部分的多相机系统的校准是不准确的。此外,特征匹配点及其空间范围在计算多相机系统外部参数方面的潜力尚未完全实现。为此,针对多摄像机系统无重叠部分的高精度标定问题,提出了一种多摄像机标定算法。将多相机系统的校准简化为使用由多个相机构建的映射来解决外部参数的变换关系的问题。首先,通过在闭环运动中对多摄像机系统中的每个摄像机分别运行SLAM算法来构建校准环境图。其次,在地图之间的相似特征点中选择均匀分布的匹配点。然后,这些匹配点用于求解多摄像机外部参数之间的变换关系。最后,最小化重投影误差,以优化外部参数变换关系。我们在多个场景中进行了全面的实验,并提供了多个相机的外部参数的结果。结果表明,该方法能够准确地对多台摄像机的外在参数进行标定,即使在主摄像机和辅助摄像机旋转180°的条件下。
    With the advancement of computer vision and sensor technologies, many multi-camera systems are being developed for the control, planning, and other functionalities of unmanned systems or robots. The calibration of multi-camera systems determines the accuracy of their operation. However, calibration of multi-camera systems without overlapping parts is inaccurate. Furthermore, the potential of feature matching points and their spatial extent in calculating the extrinsic parameters of multi-camera systems has not yet been fully realized. To this end, we propose a multi-camera calibration algorithm to solve the problem of the high-precision calibration of multi-camera systems without overlapping parts. The calibration of multi-camera systems is simplified to the problem of solving the transformation relationship of extrinsic parameters using a map constructed by multiple cameras. Firstly, the calibration environment map is constructed by running the SLAM algorithm separately for each camera in the multi-camera system in closed-loop motion. Secondly, uniformly distributed matching points are selected among the similar feature points between the maps. Then, these matching points are used to solve the transformation relationship between the multi-camera external parameters. Finally, the reprojection error is minimized to optimize the extrinsic parameter transformation relationship. We conduct comprehensive experiments in multiple scenarios and provide results of the extrinsic parameters for multiple cameras. The results demonstrate that the proposed method accurately calibrates the extrinsic parameters for multiple cameras, even under conditions where the main camera and auxiliary cameras rotate 180°.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着物联网的快速发展,网络安全越来越受到研究人员的关注。应用深度学习(DL)可以显着提高网络入侵检测系统(NIDS)的性能。然而,由于其复杂性和“黑匣子”问题,在实际场景中部署基于DL的NIDS模型带来了几个挑战,包括模型的可解释性和轻量级。DL模型中的特征选择(FS)在最小化模型参数和降低计算开销同时提高NIDS性能方面起着至关重要的作用。因此,选择有效的特征仍然是NIDS的关键问题。鉴于此,基于SHAP和因果关系原理,提出了一种可解释的加密流量入侵检测特征选择方法。该方法利用模型解释的结果进行特征选择,以减少特征计数,同时确保模型可靠性。我们在两个公共网络流量数据集上评估和验证了我们提出的方法,CICIDS2017和NSL-KDD,同时使用CNN和随机森林(RF)。实验结果表明,我们提出的方法具有出色的性能。
    With the rapid advancement of the Internet of Things, network security has garnered increasing attention from researchers. Applying deep learning (DL) has significantly enhanced the performance of Network Intrusion Detection Systems (NIDSs). However, due to its complexity and \"black box\" problem, deploying DL-based NIDS models in practical scenarios poses several challenges, including model interpretability and being lightweight. Feature selection (FS) in DL models plays a crucial role in minimizing model parameters and decreasing computational overheads while enhancing NIDS performance. Hence, selecting effective features remains a pivotal concern for NIDSs. In light of this, this paper proposes an interpretable feature selection method for encrypted traffic intrusion detection based on SHAP and causality principles. This approach utilizes the results of model interpretation for feature selection to reduce feature count while ensuring model reliability. We evaluate and validate our proposed method on two public network traffic datasets, CICIDS2017 and NSL-KDD, employing both a CNN and a random forest (RF). Experimental results demonstrate superior performance achieved by our proposed method.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    作为一种严重的炎症反应综合征,脓毒症在预测患者预后方面提出了复杂的挑战,因为其发病机制不明确,患者的出院状态不稳定.在这项研究中,我们开发了一种基于机器学习的方法来预测脓毒症患者的出院状态,旨在改善治疗决策。为了增强我们对异常值的分析的稳健性,我们结合了强大的统计方法,特别是最小协方差行列式技术。我们利用随机森林填补方法来有效地管理和填补缺失数据。对于特征选择,我们使用Lasso惩罚逻辑回归,它有效地识别重要的预测因子并降低模型复杂性,为更复杂的预测方法的应用奠定了基础。我们的预测分析结合了多种机器学习方法,包括随机森林,支持向量机,XGBoost我们将这些方法的预测性能与Lasso惩罚逻辑回归进行比较,以确定最有效的方法。每种方法的性能都通过10倍交叉验证的十次迭代进行严格评估,以确保可靠可靠的结果。我们的比较分析表明,XGBoost超越了其他模型,展示了其有效导航脓毒症数据复杂性的卓越能力。
    As a severe inflammatory response syndrome, sepsis presents complex challenges in predicting patient outcomes due to its unclear pathogenesis and the unstable discharge status of affected individuals. In this study, we develop a machine learning-based method for predicting the discharge status of sepsis patients, aiming to improve treatment decisions. To enhance the robustness of our analysis against outliers, we incorporate robust statistical methods, specifically the minimum covariance determinant technique. We utilize the random forest imputation method to effectively manage and impute missing data. For feature selection, we employ Lasso penalized logistic regression, which efficiently identifies significant predictors and reduces model complexity, setting the stage for the application of more complex predictive methods. Our predictive analysis incorporates multiple machine learning methods, including random forest, support vector machine, and XGBoost. We compare the prediction performance of these methods with Lasso penalized logistic regression to identify the most effective approach. Each method\'s performance is rigorously evaluated through ten iterations of 10-fold cross-validation to ensure robust and reliable results. Our comparative analysis reveals that XGBoost surpasses the other models, demonstrating its exceptional capability to navigate the complexities of sepsis data effectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号