stacking model

堆叠模型
  • 文章类型: Journal Article
    尽管太阳能电池(SCs)的功率转换效率(PCE)不断提高,由于其复杂的合成过程,它们离实际应用还很远,成本高,运行稳定性差。具有高材料稳定性和显着光致发光的碳量子点已成功用于发光二极管。根据Shockley-Quieisser配方中的光子平衡,良好的光发射器也应该是有效的SC,其中所有激子最终被分离。然而,有限量子大小的sp2域导致紧密的激子键合,不规则分子堆叠中高度离域的电子云形成无序的电荷转移,造成严重的能量损失。在这里,据报道,轴向生长的碳量子带(AG-CQR)具有440-850nm的宽光吸收范围。结构和计算研究表明,两端具有羰基的AG-CQR(纵横比≈2:1)可调节能级并有效分离激子。堆叠控制的二维AG-CQR薄膜进一步定向传输电子和空穴,特别是在AB堆叠模式下。单独使用这种薄膜作为活性层,SCs产生的最大PCE为1.22%,令人印象深刻的380小时的长期运行稳定性,和可重复性。这项研究为开发新一代基于碳纳米材料的SCs的实际应用打开了大门。
    Although power conversion efficiency (PCE) of solar cells (SCs) continues to improve, they are still far from practical application because of their complex synthesis process, high cost and inferior operational stability. Carbon quantum dots with high material stability and remarkable photoluminescence are successfully used in light-emitting diodes. A good light emitter should also be an efficient SC according to the photon balance in Shockley-Quieisser formulation, in which all excitons are ultimately separated. However, the finite quantum-sized sp2 domain leads to tight exciton bonding, and highly delocalized electron clouds in irregular molecular stacks form disordered charge transfer, resulting in severe energy loss. Herein, an axially growing carbon quantum ribbon (AG-CQR) with a wide optical absorption range of 440-850 nm is reported. Structural and computational studies reveal that AG-CQRs (aspect ratio ≈2:1) with carbonyl groups at both ends regulate energy level and efficiently separate excitons. The stacking-controlled two-dimensional AG-CQR film further directionally transfers electrons and holes, particularly in AB stacking mode. Using this film as active layer alone, the SCs yield a maximum PCE of 1.22%, impressive long-term operational stability of 380 h, and repeatability. This study opens the door for the development of new-generation carbon-nanomaterial-based SCs for practical applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    抗生素耐药性的迅速上升和新抗生素的缓慢发现已经威胁到全球健康。虽然新的噬菌体溶素已经成为潜在的抗菌剂,由于工作量巨大,新型溶素的实验筛选方法提出了重大挑战。这里,第一个统一软件包,即DeepLysin,开发的目的是利用人工智能来挖掘巨大的基因组库(“暗物质”)以寻找新型抗菌噬菌体溶素。从未表征的金黄色葡萄球菌噬菌体中计算筛选推定的溶素,并随机选择17种新型溶素进行实验验证。七个候选物表现出优异的体外抗菌活性,LLysSA9超过了同类最佳的替代品。LLysSA9的功效在小鼠血流和伤口感染模型中得到进一步证明。因此,这项研究证明了整合计算和实验方法的潜力,以加快发现新的抗菌蛋白,以对抗日益增长的抗菌素耐药性。
    The rapid rise of antibiotic resistance and slow discovery of new antibiotics have threatened global health. While novel phage lysins have emerged as potential antibacterial agents, experimental screening methods for novel lysins pose significant challenges due to the enormous workload. Here, the first unified software package, namely DeepLysin, is developed to employ artificial intelligence for mining the vast genome reservoirs (\"dark matter\") for novel antibacterial phage lysins. Putative lysins are computationally screened from uncharacterized Staphylococcus aureus phages and 17 novel lysins are randomly selected for experimental validation. Seven candidates exhibit excellent in vitro antibacterial activity, with LLysSA9 exceeding that of the best-in-class alternative. The efficacy of LLysSA9 is further demonstrated in mouse bloodstream and wound infection models. Therefore, this study demonstrates the potential of integrating computational and experimental approaches to expedite the discovery of new antibacterial proteins for combating increasing antimicrobial resistance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    事故严重性分析在交通事故预防和应急资源分配中具有重要意义。一系列创新提供了潜在的交通事故严重程度预测模型,以提高道路安全性。然而,交通事故数据中固有的语义信息,这对于更深入地了解其潜在因素和影响至关重要,尚未得到充分利用。此外,交通事故数据的特点通常是样本量小,这导致样本不平衡问题,导致预测性能下降。为了解决这些问题,我们提出了一种基于语义理解的数据增强双层堆叠模型,名为EnLKtreeGBDT,用于碰撞严重程度预测。具体来说,充分利用交通事故数据中固有的语义信息,分析影响交通事故的因素,设计了一个用于多维特征提取的语义增强模块。该模块旨在增强对崩溃语义的理解并提高预测准确性。然后我们介绍了一个数据增强模块,该模块利用数据去噪和迁移技术来解决数据不平衡的挑战,降低了预测模型对大样本崩溃数据的依赖。此外,我们构造了一个两层堆叠模型,该模型结合了多个线性和非线性分类器。该模型旨在增强学习线性和非线性混合关系的能力,从而提高预测复杂城市道路交通事故严重程度的准确性。在英国道路安全碰撞历史数据集上的实验验证了所提出模型的有效性,与现有技术相比,实现了预测精度的卓越性能。语义和数据增强模块的消融实验进一步证实了所提出的模型中每个模块的不可或缺性。
    The crash severity analysis is of significant importance in traffic crash prevention and emergency resource allocation. A range of innovations offers potential traffic crash severity prediction models to improve road safety. However, the semantic information inherent in traffic crash data, which is crucial in enabling a deeper understanding of its underlying factors and impacts, has yet to be fully utilized. Moreover, traffic crash data are commonly characterized by a small sample size, which leads to sample imbalance problem resulting in prediction performance decline. To tackle these problems, we propose a semantic understanding-based data-enhanced double-layer stacking model, named EnLKtreeGBDT, for crash severity prediction. Specifically, to fully leverage the inherent semantic information within traffic crash data and analyze the factors influencing crashes, we design a semantic enhancement module for multi-dimensional feature extraction. This module aims to enhance the understanding of crash semantics and improve prediction accuracy. Then we introduce a data enhancement module that utilizes data denoising and migration techniques to address the challenge of data imbalance, reducing the prediction model\'s dependence on large sample crash data. Furthermore, we construct a two-layer stacking model that combines multiple linear and nonlinear classifiers. This model is designed to augment the capability of learning linear and nonlinear mixed relationships, thereby improving the accuracy of predicting the severity of crashes on complex urban roads. Experiments on historical datasets of UK road safety crashes validate the effectiveness of the proposed model, and superior performance of prediction precision is achieved compared with the state-of-the-arts. The ablation experiments on both semantic and data enhancement modules further confirm the indispensability of each module in the proposed model.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    当前对胃液体积的护理点超声(POCUS)评估主要依靠传统的线性方法,这通常会受到中等精度的影响。这项研究旨在开发一种先进的机器学习(ML)模型,以更准确地估计胃液量。
    我们回顾性分析了临床数据和POCUS数据(D1:头尾直径,D2:前后径)在南京第一医院接受择期镇静胃肠内窥镜检查(GIE)的1386例患者使用ML技术预测胃液量,包括六个不同的ML模型和一个堆叠模型。我们使用调整后的确定系数(R2)评估了模型,平均绝对误差(MAE)和均方根误差(RMSE)。Shapley加法扩张(SHAP)方法用于解释变量的重要性。最后,构建了一个网络计算器,以促进其临床应用。
    堆叠模型(线性回归多层感知器)表现最好,调整后的最高R2为0.718(0.632至0.804)。平均预测偏倚为4毫升(MAE:4.008(3.68至4.336)),比线性模型更好。D1和D2在SHAP图中排名较高,并且在右侧卧位(RLD)中的表现要好于仰卧位。可以在https://cheason访问Web计算器。shinyapps.io/Stacking_regressor/.
    堆叠模型及其网络计算器可作为实用工具,用于准确估计接受选择性镇静GIE的患者的胃液量。建议麻醉师在患者的RLD位置测量D1和D2。
    UNASSIGNED: The current point-of-care ultrasound (POCUS) assessment of gastric fluid volume primarily relies on the traditional linear approach, which often suffers from moderate accuracy. This study aimed to develop an advanced machine learning (ML) model to estimate gastric fluid volume more accurately.
    UNASSIGNED: We retrospectively analyzed the clinical data and POCUS data (D1: craniocaudal diameter, D2: anteroposterior diameter) of 1386 patients undergoing elective sedated gastrointestinal endoscopy (GIE) at Nanjing First Hospital to predict gastric fluid volume using ML techniques, including six different ML models and a stacking model. We evaluated the models using the adjusted Coefficient of Determination (R2), mean absolute error (MAE) and root mean square error (RMSE). The SHapley Additive exPlanations (SHAP) method was used to interpret the importance of the variables. Finally, a web calculator was constructed to facilitate its clinical application.
    UNASSIGNED: The stacking model (Linear regression + Multilayer perceptron) performed best, with the highest adjusted R2 of 0.718 (0.632 to 0.804). The mean prediction bias was 4 ml (MAE: 4.008 (3.68 to 4.336)), which is better than that of the linear model. D1 and D2 ranked high in the SHAP plot and performed better in the right lateral decubitus (RLD) than in the supine position. The web calculator can be accessed at https://cheason.shinyapps.io/Stacking_regressor/.
    UNASSIGNED: The stacking model and its web calculator can serve as practical tools for accurately estimating gastric fluid volume in patients undergoing elective sedated GIE. It is recommended that anesthesiologists measure D1 and D2 in the patient\'s RLD position.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    全球对土壤重金属污染的不利影响的关注已大大增加。土壤中重金属含量的准确预测对环境保护至关重要。本研究提出了一种重金属(As,Cd,Cr,Cu,Ni,基于高光谱和机器学习算法对来自中国多个省份的21种土壤参考物质的土壤中Pb)。在此基础上,一个名为StackedRF的集成学习模型(基本模型是XGBoost,LightGBM,CatBoost,元模型为RF)进行土壤重金属反演。具体来说,最初采用了三种流行的算法来预处理光谱数据,然后随机森林(RF)被用来选择最佳的特征波段,以减少噪声的影响,最后利用Stacking和4种基本的机器学习算法建立反演模型进行比较分析。与传统的机器学习方法相比,堆叠模型展示了增强的稳定性和卓越的准确性。研究结果表明,机器学习算法,尤其是合奏学习模型,对土壤中重金属有较好的反演效果。总的来说,MF-RF-Stacking模型在六种重金属的反演中表现最好。研究结果将为利用土壤参考材料高光谱特征波段数据反演土壤重金属含量的集成学习模型方法提供新的视角。
    The global concern regarding the adverse effects of heavy metal pollution in soil has grown significantly. Accurate prediction of heavy metal content in soil is crucial for environmental protection. This study proposes an inversion analysis method for heavy metals (As, Cd, Cr, Cu, Ni, Pb) in soil based on hyperspectral and machine learning algorithms for 21 soil reference materials from multiple provinces in China. On this basis, an integrated learning model called Stacked RF (the base model is XGBoost, LightGBM, CatBoost, and the meta-model is RF) was established to perform soil heavy metal inversion. Specifically, three popular algorithms were initially employed to preprocess the spectral data, then Random Forest (RF) was used to select the best feature bands to reduce the impact of noise, finally Stacking and four basic machine learning algorithms were used to establish comparisons and analysis of inversion model. Compared with traditional machine learning methods, the stacking model showcases enhanced stability and superior accuracy. Research results indicate that machine learning algorithms, especially ensemble learning models, have better inversion effects on heavy metals in soil. Overall, the MF-RF-Stacking model performed best in the inversion of the six heavy metals. The research results will provide a new perspective on the ensemble learning model method for soil heavy metal content inversion using data of hyperspectral characteristic bands collected from soil reference materials.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    有可能导致数百万人死亡,PM2.5污染已成为全球关注的问题。在东南亚,湄公河流域(MRB)正在经历严重的PM2.5污染,MRB中现有的PM2.5研究在准确性和时空覆盖方面受到限制。为了实现MRB的高精度和长期的PM2.5监测,将融合的气溶胶光学深度(AOD)数据和多源辅助数据输入到堆叠模型中以估算PM2.5浓度。所提出的堆叠模型利用了卷积神经网络(CNN)和光梯度提升机(LightGBM)模型,可以很好地表示PM2.5-AOD关系的时空异质性。在交叉验证(CV)中,与CNN和LightGBM模型的比较表明,堆叠模型可以更好地抑制过拟合,具有较高的确定系数(R2)为0.92,较低的均方根误差(RMSE)为5.58μg/m3,较低的平均绝对误差(MAE)为3.44μg/m3。第一次,高精度PM2.5数据集揭示了2015年至2022年MRB在空间和时间上连续的PM2.5污染和变化。此外,还在区域和国家尺度上调查了年度和月度PM2.5污染的时空变化。该数据集将有助于分析PM2.5污染的原因以及MRB中缓解政策的制定。
    With the potential to cause millions of deaths, PM2.5 pollution has become a global concern. In Southeast Asia, the Mekong River Basin (MRB) is experiencing heavy PM2.5 pollution and the existing PM2.5 studies in the MRB are limited in terms of accuracy and spatiotemporal coverage. To achieve high-accuracy and long-term PM2.5 monitoring of the MRB, fused aerosol optical depth (AOD) data and multi-source auxiliary data are fed into a stacking model to estimate PM2.5 concentrations. The proposed stacking model takes advantage of convolutional neural network (CNN) and Light Gradient Boosting Machine (LightGBM) models and can well represent the spatiotemporal heterogeneity of the PM2.5-AOD relationship. In the cross-validation (CV), comparison with CNN and LightGBM models shows that the stacking model can better suppress overfitting, with a higher coefficient of determination (R2) of 0.92, a lower root mean square error (RMSE) of 5.58 μg/m3, and a lower mean absolute error (MAE) of 3.44 μg/m3. For the first time, the high-accuracy PM2.5 dataset reveals spatially and temporally continuous PM2.5 pollution and variations in the MRB from 2015 to 2022. Moreover, the spatiotemporal variations of annual and monthly PM2.5 pollution are also investigated at the regional and national scales. The dataset will contribute to the analysis of the causes of PM2.5 pollution and the development of mitigation policies in the MRB.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    中国加强县级垃圾焚烧能源的分层战略符合可持续发展目标(SDGs),强调需要全面评估废物转化能源(WtE)工厂的适用性。传统的评估方法面临挑战,特别是在建议创新的网站替代品时,适应新的数据集,以及它们对严格假设的依赖。这项研究引入了三个关键维度的增强。方法上,它利用数据驱动的机器学习(ML)方法来捕获站点选择所必需的复杂关系,减少对严格假设的依赖。在预测性能方面,过采样与堆叠集成模型的集成增强了ML模型的多样性和泛化性。来自四个ML模型的曲线下面积(AUC)得分,由过采样数据集增强,与原始数据集相比,显示出显著的改进。堆叠模型出色,取得92%的成绩。它还引领了整体精度和召回,分别达到85.2%和85.08%。然而,阳性类别的Precision和Recall存在明显差异。堆叠模型的精度得分高达83.1%,其次是极限梯度提升(XGBoost)(82.61%)。在召回方面,XGBoost最低,为85.07%,而其他三个分类器都标记为88.06%。从行业适用性的角度来看,堆叠模型提供了创新的区位选择,并展示了在湖南省的适应性,为WtE定位提供可重复使用的工具。总之,这项研究不仅提高了WtE选址的方法学方面,而且提供了实用和适应性强的解决方案,为可持续废物管理实践做出积极贡献。
    China\'s tiered strategy to enhance county-level waste incineration for energy aligns with the sustainable development goals (SDGs), emphasizing the need for comprehensive assessments of waste-to-energy (WtE) plant suitability. Traditional assessment methodologies face challenges, particularly in suggesting innovative site alternatives, adapting to new data sets, and their dependence on strict assumptions. This study introduced enhancements in three pivotal dimensions. Methodologically, it leverages data-driven machine learning (ML) approaches to capture the complex relationships essential for site selection, reducing dependency on strict assumptions. In terms of predictive performance, the integration of oversampling with stacked ensemble models enhances the diversity and generalizability of ML models. The area under curve (AUC) scores from four ML models, enhanced by the oversampled dataset, demonstrated significant improvements compared to the original dataset. The stacking model excelled, achieving a score of 92%. It also led in overall Precision and Recall, reaching 85.2% and 85.08% respectively. Nevertheless, a noticeable discrepancy existed in Precision and Recall for positive classes. The stacking model topped Precision scores at 83.1%, followed by eXtreme Gradient Boosting (XGBoost) (82.61%). In terms of Recall, XGBoost recorded the lowest at 85.07%, while the other three classifiers all marked 88.06%. From an industry applicability standpoint, the stacking model provides innovative location alternatives and demonstrates adaptability in Hunan province, offering a reusable tool for WtE location. In conclusion, this study not only enhances the methodological aspects of WtE site selection but also provides practical and adaptable solutions, contributing positively to sustainable waste management practices.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    准确预测化合物-蛋白质结合亲和力是药物发现中的关键任务。计算模型具有时间短的优点,与传统药物开发相比,成本低、安全。口袋是蛋白质的关键结合区,这为药物重新定位和药物设计提供了宝贵的信息。在这项研究中,我们提出了一个集成学习模型,叫做StackCPA,预测化合物-蛋白质结合亲和力。该模型通过迁移学习策略集成了蛋白质口袋和化合物的多尺度特征。蛋白质口袋以原子水平的细粒度描述,残基水平和子域水平。在三个绑定亲和力基准数据集上评估了所提出的模型StackCPA。实验结果表明,与其他最先进的深度学习模型相比,StackCPA在所有三个数据集上都取得了最佳性能。消融研究表明,蛋白质口袋可以为亲和力预测提供足够的信息,其多尺度特征使模型能够进一步提高预测性能。此外,表皮生长因子受体erbB1(EGFR)的案例研究表明,StackCPA可以作为药物再利用的有效工具。StackCPA的源代码和数据可在https://github.com/CSUBioGroup/StackCPA获得。
    Accurately predicting compound-protein binding affinity is a crucial task in drug discovery. Computational models offer the advantages of short time, low cost and safety compared to traditional drug development. Pocket is the key binding region of the protein, which provides invaluable information for drug repositioning and drug design. In this study, we propose an ensemble learning model, called StackCPA, to predict the compound-protein binding affinity. The model integrates multi-scale features of protein pocket and compound through a transfer learning strategy. The protein pocket is described in a fine-grained way by atomic level, residue level and subdomain level. The proposed model StackCPA is evaluated on three binding affinity benchmark datasets. The experiment results show that StackCPA achieves the best performance on all the three datasets in comparison with other state-of-the-art deep learning models. The ablation study shows that the protein pocket can provide sufficient information for affinity prediction and its multi-scale features enable the model to further improve the prediction performance. In addition, the case study for epidermal growth factor receptor erbB1 (EGFR) indicates that StackCPA could serve as an effective tool for drug repurposing. Source codes and data of StackCPA are available at https://github.com/CSUBioGroup/StackCPA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在高血压引起的问题中,早期肾损害常被忽视。它不能被诊断,直到病情严重和不可逆转的损害发生。因此,我们决定筛选和探讨高血压患者早期肾损害的相关危险因素,并基于数据挖掘方法建立肾损害预警模型,以实现对高血压患者肾损害的早期诊断。
    借助高血压门诊患者电子信息管理系统,我们收集了513箱原件,未经治疗的高血压患者。我们记录了他们的人口统计数据,动态血压参数,血常规指标,和血液生化指标建立临床数据库。然后通过特征工程和随机森林筛选早期肾损害的危险因素,额外的树木,和XGBoost建立预警模型,分别。最后,基于堆叠策略,通过模型融合建立了新的模型。我们使用交叉验证来评估每个模型的稳定性和可靠性,以确定最佳的风险评估模型。
    根据重要程度,特征工程选取的特征降序为夜间收缩压下降率,红细胞分布宽度,血压昼夜节律,白天的平均舒张压,体表面积,吸烟,年龄,和HDL。基于Stacking策略的全特征二维融合模型的平均精度为0.89685,选取特征为0.93824,大大提高。
    通过特征工程和风险因素分析,我们选择夜间收缩压的下降速度,红细胞分布宽度,血压昼夜节律,白天平均舒张压作为高血压患者早期肾损害的预警因素。在此基础上,基于Stacking策略的二维融合模型比单一模型具有更好的效果,可用于高血压患者早期肾损害的风险评估。
    BACKGROUND: Among the problems caused by hypertension, early renal damage is often ignored. It can not be diagnosed until the condition is severe and irreversible damage occurs. So we decided to screen and explore related risk factors for hypertensive patients with early renal damage and establish the early-warning model of renal damage based on the data-mining method to achieve an early diagnosis for hypertensive patients with renal damage.
    METHODS: With the aid of an electronic information management system for hypertensive out-patients, we collected 513 cases of original, untreated hypertensive patients. We recorded their demographic data, ambulatory blood pressure parameters, blood routine index, and blood biochemical index to establish the clinical database. Then we screen risk factors for early renal damage through feature engineering and use Random Forest, Extra-Trees, and XGBoost to build an early-warning model, respectively. Finally, we build a new model by model fusion based on the Stacking strategy. We use cross-validation to evaluate the stability and reliability of each model to determine the best risk assessment model.
    RESULTS: According to the degree of importance, the descending order of features selected by feature engineering is the drop rate of systolic blood pressure at night, the red blood cell distribution width, blood pressure circadian rhythm, the average diastolic blood pressure at daytime, body surface area, smoking, age, and HDL. The average precision of the two-dimensional fusion model with full features based on the Stacking strategy is 0.89685, and selected features are 0.93824, which is greatly improved.
    CONCLUSIONS: Through feature engineering and risk factor analysis, we select the drop rate of systolic blood pressure at night, the red blood cell distribution width, blood pressure circadian rhythm, and the average diastolic blood pressure at daytime as early-warning factors of early renal damage in patients with hypertension. On this basis, the two-dimensional fusion model based on the Stacking strategy has a better effect than the single model, which can be used for risk assessment of early renal damage in hypertensive patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    皮肤损伤是与皮肤的其他区域相比观察到异常生长的皮肤部分。ISIC2018病变数据集有七个类别。它的微型数据集版本也只有两类:恶性和良性。恶性肿瘤是癌性肿瘤,良性肿瘤是非癌性的。恶性肿瘤能够以更快的速度繁殖和扩散到全身。癌性皮肤病变的早期检测对于患者的生存至关重要。深度学习模型和机器学习模型在皮肤病变的检测中起着至关重要的作用。尽管如此,由于图像遮挡和不平衡的数据集,到目前为止,准确性已经受到影响。在本文中,我们介绍了一种使用深度学习和机器学习模型集成堆叠的黑色素瘤皮肤癌非侵入性诊断的可解释方法。用于训练分类器模型的数据集包含良性和恶性皮肤痣的平衡图像。手工制作的特征用于训练基础模型(逻辑回归,SVM,随机森林,KNN,和梯度提升机)的机器学习。这些基础模型的预测用于在训练集上使用交叉验证来训练一级模型堆叠。深度学习模型(MobileNet、Xception,ResNet50、ResNet50V2和DenseNet121)用于迁移学习,并且已经对ImageNet数据进行了预训练。对每个模型评估分类器。然后将深度学习模型与不同的模型组合进行整合并进行评估。此外,形状自适应的解释用于构建可解释性方法,该方法生成热图以识别图像中最提示疾病的部分。这使皮肤科医生能够以对他们有意义的方式理解我们模型的结果。为了评估,我们计算了准确度,F1分数,科恩的卡帕,混淆矩阵,和ROC曲线,并确定了皮肤病变分类的最佳模型。
    A skin lesion is a portion of skin that observes abnormal growth compared to other areas of the skin. The ISIC 2018 lesion dataset has seven classes. A miniature dataset version of it is also available with only two classes: malignant and benign. Malignant tumors are tumors that are cancerous, and benign tumors are non-cancerous. Malignant tumors have the ability to multiply and spread throughout the body at a much faster rate. The early detection of the cancerous skin lesion is crucial for the survival of the patient. Deep learning models and machine learning models play an essential role in the detection of skin lesions. Still, due to image occlusions and imbalanced datasets, the accuracies have been compromised so far. In this paper, we introduce an interpretable method for the non-invasive diagnosis of melanoma skin cancer using deep learning and ensemble stacking of machine learning models. The dataset used to train the classifier models contains balanced images of benign and malignant skin moles. Hand-crafted features are used to train the base models (logistic regression, SVM, random forest, KNN, and gradient boosting machine) of machine learning. The prediction of these base models was used to train level one model stacking using cross-validation on the training set. Deep learning models (MobileNet, Xception, ResNet50, ResNet50V2, and DenseNet121) were used for transfer learning, and were already pre-trained on ImageNet data. The classifier was evaluated for each model. The deep learning models were then ensembled with different combinations of models and assessed. Furthermore, shapely adaptive explanations are used to construct an interpretability approach that generates heatmaps to identify the parts of an image that are most suggestive of the illness. This allows dermatologists to understand the results of our model in a way that makes sense to them. For evaluation, we calculated the accuracy, F1-score, Cohen\'s kappa, confusion matrix, and ROC curves and identified the best model for classifying skin lesions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号