interpretability analysis

  • 文章类型: Journal Article
    准确理解酶的生物学功能对于病理学和工业生物技术中的各种任务至关重要。然而,现有方法通常速度不够快,对预测结果缺乏解释,这严重限制了它们的实际应用。根据我们之前的工作,Deepre,我们通过设计新颖的自我引导注意力并结合通过大型蛋白质语言模型学习的生物学知识,提出了一种新的可解释和快速版本(ifDEEPre),以准确预测酶的佣金数量并确认其功能。新颖的自我引导注意力旨在优化表征的独特贡献,自动检测关键蛋白质基序以提供有意义的解释。从原始蛋白质序列中学习的表示经过严格筛选,以提高框架的运行速度,比DEEPre快50倍,同时需要小12.89倍的存储空间。大型语言模块被纳入,以学习数以亿计的蛋白质的物理特性,扩展整个网络的生物学知识。大量的实验表明,如果DEEPre优于所有当前的方法,在新数据集上实现超过14.22%的F1分数。此外,经过训练的ifDEEPre模型通过仅获取没有标记信息的原始序列来准确捕获多级蛋白质生物学模式并推断酶的进化趋势。同时,如果DEEPre预测不同酵母亚种之间的进化关系,这与地面事实高度一致。案例研究表明,如果DEEPre能够检测到关键的氨基酸基序,这对设计新型酶具有重要意义。运行ifDEEPre的Web服务器可在https://proj获得。CSE。中大。edu.hk/aihlab/ifdeepre/为公众提供便捷的服务。同时,ifDEEPre可在GitHub上免费获得,网址为https://github.com/ml4bio/ifDEEPre/。
    Accurate understanding of the biological functions of enzymes is vital for various tasks in both pathologies and industrial biotechnology. However, the existing methods are usually not fast enough and lack explanations on the prediction results, which severely limits their real-world applications. Following our previous work, DEEPre, we propose a new interpretable and fast version (ifDEEPre) by designing novel self-guided attention and incorporating biological knowledge learned via large protein language models to accurately predict the commission numbers of enzymes and confirm their functions. Novel self-guided attention is designed to optimize the unique contributions of representations, automatically detecting key protein motifs to provide meaningful interpretations. Representations learned from raw protein sequences are strictly screened to improve the running speed of the framework, 50 times faster than DEEPre while requiring 12.89 times smaller storage space. Large language modules are incorporated to learn physical properties from hundreds of millions of proteins, extending biological knowledge of the whole network. Extensive experiments indicate that ifDEEPre outperforms all the current methods, achieving more than 14.22% larger F1-score on the NEW dataset. Furthermore, the trained ifDEEPre models accurately capture multi-level protein biological patterns and infer evolutionary trends of enzymes by taking only raw sequences without label information. Meanwhile, ifDEEPre predicts the evolutionary relationships between different yeast sub-species, which are highly consistent with the ground truth. Case studies indicate that ifDEEPre can detect key amino acid motifs, which have important implications for designing novel enzymes. A web server running ifDEEPre is available at https://proj.cse.cuhk.edu.hk/aihlab/ifdeepre/ to provide convenient services to the public. Meanwhile, ifDEEPre is freely available on GitHub at https://github.com/ml4bio/ifDEEPre/.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    加密流量的广泛使用对网络管理和网络安全提出了挑战。传统的基于机器学习的加密流量分类方法不再满足管理和安全的需求。深度学习技术在加密流量分类中的应用显著提高了模型的准确性。本研究主要关注网络分析和网络安全领域的加密流量分类。为解决现有基于深度学习的加密流量分类方法在计算内存消耗和可解释性方面的不足,我们介绍了一种参数有效的微调方法,用于有效地调整加密流量分类模型的参数。对各种分类场景进行了实验,包括Tor流量服务分类和恶意流量分类,使用多个公共数据集。与最先进的深度学习模型架构进行了公平的比较。结果表明,所提出的方法显着降低了微调参数的规模和计算资源的使用,同时实现了与现有最佳模型相当的性能。此外,通过分析预训练模型的参数和结构,解释了预训练模型中加密流量表示的学习机制。这一比较验证了该模型表现出层次结构的假设,组织清晰,和独特的特征。
    The widespread use of encrypted traffic poses challenges to network management and network security. Traditional machine learning-based methods for encrypted traffic classification no longer meet the demands of management and security. The application of deep learning technology in encrypted traffic classification significantly improves the accuracy of models. This study focuses primarily on encrypted traffic classification in the fields of network analysis and network security. To address the shortcomings of existing deep learning-based encrypted traffic classification methods in terms of computational memory consumption and interpretability, we introduce a Parameter-Efficient Fine-Tuning method for efficiently tuning the parameters of an encrypted traffic classification model. Experimentation is conducted on various classification scenarios, including Tor traffic service classification and malicious traffic classification, using multiple public datasets. Fair comparisons are made with state-of-the-art deep learning model architectures. The results indicate that the proposed method significantly reduces the scale of fine-tuning parameters and computational resource usage while achieving performance comparable to that of the existing best models. Furthermore, we interpret the learning mechanism of encrypted traffic representation in the pre-training model by analyzing the parameters and structure of the model. This comparison validates the hypothesis that the model exhibits hierarchical structure, clear organization, and distinct features.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    对于胃癌(GC)的诊断和预后预测,基于整个幻灯片病理图像(WSI)的机器学习方法已显示出有希望的性能,并降低了人工分析的成本。然而,GC结果的准确预测可能依赖于具有互补信息的多种模式,特别是基因表达数据。因此,有必要开发多模态学习方法来提高预测性能。在本文中,我们从瑞金医院收集了一个数据集,并提出了一种用于GC诊断和结果预测的多模态学习方法,叫做GaCaMML,其特点是跨模式注意力机制和Per-Slide训练方案。此外,我们通过集成梯度(IG)进行特征归因分析,以识别重要的输入特征。该方法在三个任务上比单模态学习方法提高了预测精度,即,生存预测(C指数为4.9%),病理分期分类(准确率为11.6%),和淋巴结分类(准确率为12.0%)。尤其是,Per-Slide策略解决了高WSI与患者比率的问题,与Per-Person培训方案相比,结果要好得多.对于可解释的分析,我们发现,尽管WSI主导了大多数样本的预测,仍有相当一部分样本的预测高度依赖于基因表达信息。这项研究证明了多模态学习在GC相关预测任务中的巨大潜力,并调查了WSI和基因表达的贡献。分别,这不仅显示了模型如何做出决定,而且还提供了对宏观病理表型和微观分子特征之间关联的见解。
    For the diagnosis and outcome prediction of gastric cancer (GC), machine learning methods based on whole slide pathological images (WSIs) have shown promising performance and reduced the cost of manual analysis. Nevertheless, accurate prediction of GC outcome may rely on multiple modalities with complementary information, particularly gene expression data. Thus, there is a need to develop multimodal learning methods to enhance prediction performance. In this paper, we collect a dataset from Ruijin Hospital and propose a multimodal learning method for GC diagnosis and outcome prediction, called GaCaMML, which is featured by a cross-modal attention mechanism and Per-Slide training scheme. Additionally, we perform feature attribution analysis via integrated gradient (IG) to identify important input features. The proposed method improves prediction accuracy over the single-modal learning method on three tasks, i.e., survival prediction (by 4.9% on C-index), pathological stage classification (by 11.6% on accuracy), and lymph node classification (by 12.0% on accuracy). Especially, the Per-Slide strategy addresses the issue of a high WSI-to-patient ratio and leads to much better results compared with the Per-Person training scheme. For the interpretable analysis, we find that although WSIs dominate the prediction for most samples, there is still a substantial portion of samples whose prediction highly relies on gene expression information. This study demonstrates the great potential of multimodal learning in GC-related prediction tasks and investigates the contribution of WSIs and gene expression, respectively, which not only shows how the model makes a decision but also provides insights into the association between macroscopic pathological phenotypes and microscopic molecular features.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    准确描述短核仁RNA(snoRNA)与疾病之间的联系对于推进疾病检测和治疗至关重要。虽然传统的生物实验方法是有效的,他们是劳动密集型的,成本高昂且缺乏可扩展性。随着计算机技术的不断进步,越来越多的深度学习技术被用来预测snoRNA与疾病的关联。然而,这些方法中的大多数是黑盒模型,缺乏可解释性和阐明snoRNA-疾病关联机制的能力。在这项研究中,我们介绍IGCNSDA,一种创新且可解释的图卷积网络(GCN)方法,专为有效推断snoRNA-疾病关联而定制。IGCNSDA利用GCN框架从二分snoRNA-疾病图中提取snoRNA和疾病的节点特征表示。具有高度相似性的SnoRNAs更有可能与类似的疾病相关,反之亦然。为了促进这一进程,我们引入了一种子图生成算法,该算法将相似的snoRNA及其相关疾病有效地分组到有凝聚力的子图中。随后,我们从这些子图中的相邻节点聚合信息,迭代更新snoRNA和疾病的嵌入。实验结果表明,IGCNSDA优于最近的,高度相关的方法。此外,我们的可解释性分析提供了令人信服的证据,即IGCNSDA巧妙地捕获了snoRNA和疾病之间的潜在相似性,从而提高了研究人员对snoRNA-疾病关联机制的认识。此外,我们提供了说明性的案例研究,证明IGCNSDA作为有效预测潜在snoRNA-疾病相关的有价值工具的实用性.IGCNSDA的数据集和源代码可在以下网址公开访问:https://github.com/altriavin/IGCNSDA。
    Accurately delineating the connection between short nucleolar RNA (snoRNA) and disease is crucial for advancing disease detection and treatment. While traditional biological experimental methods are effective, they are labor-intensive, costly and lack scalability. With the ongoing progress in computer technology, an increasing number of deep learning techniques are being employed to predict snoRNA-disease associations. Nevertheless, the majority of these methods are black-box models, lacking interpretability and the capability to elucidate the snoRNA-disease association mechanism. In this study, we introduce IGCNSDA, an innovative and interpretable graph convolutional network (GCN) approach tailored for the efficient inference of snoRNA-disease associations. IGCNSDA leverages the GCN framework to extract node feature representations of snoRNAs and diseases from the bipartite snoRNA-disease graph. SnoRNAs with high similarity are more likely to be linked to analogous diseases, and vice versa. To facilitate this process, we introduce a subgraph generation algorithm that effectively groups similar snoRNAs and their associated diseases into cohesive subgraphs. Subsequently, we aggregate information from neighboring nodes within these subgraphs, iteratively updating the embeddings of snoRNAs and diseases. The experimental results demonstrate that IGCNSDA outperforms the most recent, highly relevant methods. Additionally, our interpretability analysis provides compelling evidence that IGCNSDA adeptly captures the underlying similarity between snoRNAs and diseases, thus affording researchers enhanced insights into the snoRNA-disease association mechanism. Furthermore, we present illustrative case studies that demonstrate the utility of IGCNSDA as a valuable tool for efficiently predicting potential snoRNA-disease associations. The dataset and source code for IGCNSDA are openly accessible at: https://github.com/altriavin/IGCNSDA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在废水处理过程中,连续的出水水质预测对于主动降低对环境和人类健康的风险至关重要。然而,废水处理是一个极其复杂的过程,受几个不确定因素的控制,相互依存,有时表征不佳的物理化学-生物过程参数。此外,有大量的时空变化,不确定性,以及处理过程中涉及的水质参数和过程变量之间的高度非线性相互作用。这种复杂性阻碍了有效的监控,操作,以及正常和异常情况下的污水处理厂管理。典型的数学和统计工具通常无法捕捉到这种复杂的相互关系,因此,数据驱动技术提供了一个有吸引力的解决方案,以有效地量化废水处理厂的性能。尽管以前的几项研究侧重于应用基于回归的数据驱动模型(例如,人工神经网络)来预测一些废水处理出水参数,这些研究中的大多数采用了有限数量的输入变量来预测仅一个或两个表征废水质量的参数(例如,化学需氧量(COD)和/或悬浮固体(SS))。利用人工智能(AI)的力量,目前的研究提出了基于多基因遗传编程(MGGP)的模型,使用从运行中的废水处理厂获得的数据集,部署膜曝气生物膜反应器,为了预测过滤后的COD,氨(NH4),和SS浓度以及流出物内的碳氮比(C/N)。输入特征包括一组表征流入质量的过程变量(例如,过滤后的COD,NH4和SS浓度),水物理和化学参数(例如,温度和pH),和操作条件(例如,施加的空气压力)。开发的基于MGGP的模型准确地再现了四个输出变量的观察结果,其相关系数值在训练期间介于0.98和0.99之间,在测试期间介于0.96和0.99之间。反映了所开发模型在预测处理系统流出物质量方面的能力。随后进行了可解释性分析,以确认对输入输出相互关系的直观理解,并确定处理过程的控制参数。开发的基于MGGP的模型可以通过设计最佳的快速运行和控制方案,并协助工厂操作员在各种正常和破坏性运行条件下保持工厂的适当性能,从而促进对污水处理厂的AI驱动监测和管理。
    Continuous effluent quality prediction in wastewater treatment processes is crucial to proactively reduce the risks to the environment and human health. However, wastewater treatment is an extremely complex process controlled by several uncertain, interdependent, and sometimes poorly characterized physico-chemical-biological process parameters. In addition, there are substantial spatiotemporal variations, uncertainties, and high non-linear interactions among the water quality parameters and process variables involved in the treatment process. Such complexities hinder efficient monitoring, operation, and management of wastewater treatment plants under normal and abnormal conditions. Typical mathematical and statistical tools most often fail to capture such complex interrelationships, and therefore data-driven techniques offer an attractive solution to effectively quantify the performance of wastewater treatment plants. Although several previous studies focused on applying regression-based data-driven models (e.g., artificial neural network) to predict some wastewater treatment effluent parameters, most of these studies employed a limited number of input variables to predict only one or two parameters characterizing the effluent quality (e.g., chemical oxygen demand (COD) and/or suspended solids (SS)). Harnessing the power of Artificial Intelligence (AI), the current study proposes multi-gene genetic programming (MGGP)-based models, using a dataset obtained from an operational wastewater treatment plant, deploying membrane aerated biofilm reactor, to predict the filtrated COD, ammonia (NH4), and SS concentrations along with the carbon-to-nitrogen ratio (C/N) within the effluent. Input features included a set of process variables characterizing the influent quality (e.g., filtered COD, NH4, and SS concentrations), water physics and chemistry parameters (e.g., temperature and pH), and operation conditions (e.g., applied air pressure). The developed MGGP-based models accurately reproduced the observations of the four output variables with correlation coefficient values that ranged between 0.98 and 0.99 during training and between 0.96 and 0.99 during testing, reflecting the power of the developed models in predicting the quality of the effluent from the treatment system. Interpretability analyses were subsequently deployed to confirm the intuitive understanding of input-output interrelations and to identify the governing parameters of the treatment process. The developed MGGP-based models can facilitate the AI-driven monitoring and management of wastewater treatment plants through devising optimal rapid operation and control schemes and assisting the plants\' operators in maintaining proper performance of the plants under various normal and disruptive operational conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    COVID-19的爆发以其相当迅速的传播震惊了整个世界,并挑战了不同的部门。限制其传播的最有效方法之一是对感染患者的早期和准确诊断。医学成像,如X射线和计算机断层扫描(CT),结合人工智能(AI)的潜力,在支持医务人员的诊断过程中起着至关重要的作用。因此,在这篇文章中,五种不同的深度学习模型(ResNet18,ResNet34,InceptionV3,InceptionResNetV2和DenseNet161)及其集合,使用多数投票,已被用于使用胸部X射线图像对COVID-19、肺炎和健康受试者进行分类。进行多标签分类以预测每位患者的多种病理,如果存在。首先,使用局部可解释性方法--遮挡,对每个网络的可解释性进行了彻底研究,显著性,输入X梯度,引导反向传播,积分梯度,和DeepLIFT-并使用全局技术-神经元激活谱。COVID-19分类模型的平均微F1得分在0.66至0.875之间,网络模型集合的平均微F1得分为0.89。定性结果表明,ResNets是最可解释的模型。这项研究证明了在做出有关最佳性能模型的决定之前,使用可解释性方法比较不同模型的重要性。
    The outbreak of COVID-19 has shocked the entire world with its fairly rapid spread, and has challenged different sectors. One of the most effective ways to limit its spread is the early and accurate diagnosing of infected patients. Medical imaging, such as X-ray and computed tomography (CT), combined with the potential of artificial intelligence (AI), plays an essential role in supporting medical personnel in the diagnosis process. Thus, in this article, five different deep learning models (ResNet18, ResNet34, InceptionV3, InceptionResNetV2, and DenseNet161) and their ensemble, using majority voting, have been used to classify COVID-19, pneumoniæ and healthy subjects using chest X-ray images. Multilabel classification was performed to predict multiple pathologies for each patient, if present. Firstly, the interpretability of each of the networks was thoroughly studied using local interpretability methods-occlusion, saliency, input X gradient, guided backpropagation, integrated gradients, and DeepLIFT-and using a global technique-neuron activation profiles. The mean micro F1 score of the models for COVID-19 classifications ranged from 0.66 to 0.875, and was 0.89 for the ensemble of the network models. The qualitative results showed that the ResNets were the most interpretable models. This research demonstrates the importance of using interpretability methods to compare different models before making a decision regarding the best performing model.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:本研究的目的是评估多参数磁共振成像(MRI)对分子亚型的预测性能,并使用SHapley加法移植(SHAP)分析解释特征。
    方法:接受治疗前MRI(包括超快动态对比增强MRI,磁共振波谱,在2019年2月至2022年1月之间招募了扩散峰度成像和体素内不相干运动)。收集了13个语义和13个多参数特征,并选择了关键特征来开发用于预测乳腺癌分子亚型的机器学习模型(luminalA,管腔B,三阴性和HER2富集),采用逐步逻辑回归。建立了基于5种机器学习分类器的语义模型和多参数模型并进行了比较。使用SHAP分析解释模型决策。
    结果:共有188名女性(平均年龄,53±11[标准偏差]岁;年龄范围:25-75岁)被纳入,并进一步分为培训队列(131名女性)和验证队列(57名女性)。XGBoost在五个机器学习分类器中表现出良好的预测性能。在验证队列中,语义模型的受试者工作特征曲线(AUC)下的面积范围从HER2富集亚型的0.693(95%置信区间[CI]:0.478-0.839)到腔内A亚型的0.764(95%CI:0.681-0.908),劣于多参数模型,这些模型产生的AUC范围从HER2富集亚型的0.771(95%CI:0.630-0.888)到三阴性亚型的0.857(95%CI:0.717-0.957).语义模型和多参数模型之间的AUC没有显示显着差异(P范围:0.217-0.640)。SHAP分析显示,较低的iAUC,更高的峰度,较低的D*,较低的峰度是腔A的独特特征,管腔B,三阴性乳腺癌,和HER2富集亚型,分别。
    结论:多参数MRI在有效预测乳腺癌分子亚型方面优于语义模型。
    OBJECTIVE: The purpose of this study was to assess the predictive performance of multiparametric magnetic resonance imaging (MRI) for molecular subtypes and interpret features using SHapley Additive exPlanations (SHAP) analysis.
    METHODS: Patients with breast cancer who underwent pre-treatment MRI (including ultrafast dynamic contrast-enhanced MRI, magnetic resonance spectroscopy, diffusion kurtosis imaging and intravoxel incoherent motion) were recruited between February 2019 and January 2022. Thirteen semantic and thirteen multiparametric features were collected and the key features were selected to develop machine-learning models for predicting molecular subtypes of breast cancers (luminal A, luminal B, triple-negative and HER2-enriched) by using stepwise logistic regression. Semantic model and multiparametric model were built and compared based on five machine-learning classifiers. Model decision-making was interpreted using SHAP analysis.
    RESULTS: A total of 188 women (mean age, 53 ± 11 [standard deviation] years; age range: 25-75 years) were enrolled and further divided into training cohort (131 women) and validation cohort (57 women). XGBoost demonstrated good predictive performance among five machine-learning classifiers. Within the validation cohort, the areas under the receiver operating characteristic curves (AUCs) for the semantic models ranged from 0.693 (95% confidence interval [CI]: 0.478-0.839) for HER2-enriched subtype to 0.764 (95% CI: 0.681-0.908) for luminal A subtype, inferior to multiparametric models that yielded AUCs ranging from 0.771 (95% CI: 0.630-0.888) for HER2-enriched subtype to 0.857 (95% CI: 0.717-0.957) for triple-negative subtype. The AUCs between the semantic and the multiparametric models did not show significant differences (P range: 0.217-0.640). SHAP analysis revealed that lower iAUC, higher kurtosis, lower D*, and lower kurtosis were distinctive features for luminal A, luminal B, triple-negative breast cancer, and HER2-enriched subtypes, respectively.
    CONCLUSIONS: Multiparametric MRI is superior to semantic models to effectively predict the molecular subtypes of breast cancer.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    随着深度学习技术的快速发展,这些应用在各个领域变得越来越广泛。然而,传统的深度学习方法通常被称为“黑箱”模型,其结果的可解释性较低,对它们在某些关键领域的应用提出了挑战。在这项研究中,我们提出了一种情感模型可解释性分析的综合方法。所提出的方法包括两个主要方面:基于注意力的分析和外部知识集成。首先,我们在情感分类和生成任务中训练模型,以从多个角度捕获注意力得分。这种多角度的方法减少了偏见,并提供了对潜在情绪的更全面理解。第二,我们整合了一个外部知识库来改进证据提取。通过利用角色得分,我们检索完整的情感证据短语,解决中文文本中证据提取不完全的挑战。在情感可解释性评估数据集上的实验结果表明了我们方法的有效性。我们观察到准确率显著提高了1.3%,宏F1下降13%,MAP下降23%。总的来说,我们的方法通过结合基于注意力的分析和外部知识的整合,为增强情绪模型的可解释性提供了一个稳健的解决方案.
    With the rapid development of deep learning techniques, the applications have become increasingly widespread in various domains. However, traditional deep learning methods are often referred to as \"black box\" models with low interpretability of their results, posing challenges for their application in certain critical domains. In this study, we propose a comprehensive method for the interpretability analysis of sentiment models. The proposed method encompasses two main aspects: attention-based analysis and external knowledge integration. First, we train the model within sentiment classification and generation tasks to capture attention scores from multiple perspectives. This multi-angle approach reduces bias and provides a more comprehensive understanding of the underlying sentiment. Second, we incorporate an external knowledge base to improve evidence extraction. By leveraging character scores, we retrieve complete sentiment evidence phrases, addressing the challenge of incomplete evidence extraction in Chinese texts. Experimental results on a sentiment interpretability evaluation dataset demonstrate the effectiveness of our method. We observe a notable increase in accuracy by 1.3%, Macro-F1 by 13%, and MAP by 23%. Overall, our approach offers a robust solution for enhancing the interpretability of sentiment models by combining attention-based analysis and the integration of external knowledge.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    精确预测污水处理厂(WWTP)中氮基污染物的浓度对于优化WWTP的运营调整仍然是一项具有挑战性但至关重要的任务。在这项研究中,采用因子分析(FA)和机器学习(ML)模型的集成方法来准确预测污水处理厂的总氮(Ntoteff)和硝酸盐氮(NO3-Neff)浓度。通过FA磨练ML模型的输入值,以优化因素,从而显著提高了ML预测精度。预测模型实现了97.43%(Ntoteff)和99.38%(NO3-Neff)的最高决定系数(R2),对提前三天的预测表现出令人满意的泛化能力(R2>80%)。此外,可解释性分析确定了反硝化因子,污染物负荷系数,气象因子显著。本研究提出的模型框架为优化污水处理运行管理提供了有价值的参考。
    Precisely predicting the concentration of nitrogen-based pollutants from the wastewater treatment plants (WWTPs) remains a challenging yet crucial task for optimizing operational adjustments in WWTPs. In this study, an integrated approach using factor analysis (FA) and machine learning (ML) models was employed to accurately predict effluent total nitrogen (Ntoteff) and nitrate nitrogen (NO3-Neff) concentrations of the WWTP. The input values for the ML models were honed through FA to optimize factors, thereby significantly enhancing the ML prediction accuracy. The prediction model achieved a highest coefficient of determination (R2) of 97.43 % (Ntoteff) and 99.38 % (NO3-Neff), demonstrating satisfactory generalization ability for predictions up to three days ahead (R2 >80 %). Moreover, the interpretability analysis identified that the denitrification factor, the pollutant load factor, and the meteorological factor were significant. The model framework proposed in this study provides a valuable reference for optimizing the operation and management of wastewater treatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:本研究旨在开发用于预测阿尔茨海默病(AD)的新型机器学习模型,并确定针对性预防的关键因素。
    方法:我们包括1219、863和482名60岁以上的参与者,只有社会人口统计学,社会人口统计学和自我报告的健康,前两种和血液生物标志物信息来自阿尔茨海默病神经影像学倡议(ADNI)数据库。构建机器学习模型来预测上述三个人群的AD风险。模型性能是通过区别对待来评估的,校准,和临床有用性。Shapley加性解释(SHAP)用于确定最佳模型的关键预测因子。
    结果:这三个人群的平均年龄分别为73.49、74.52和74.29岁,分别。具有社会人口统计学信息的模型和具有社会人口统计学和自我报告的健康信息的模型表现出适度的表现。对于具有社会人口统计学和自我报告健康状况的模型,和血液生物标志物信息,他们的整体表现大大提高,具体来说,LR表现最好,AUC值为0.818。ptau蛋白和血浆神经丝光的血液生物标志物,年龄,血tau蛋白和教育水平是前五名显著预测因子.此外,牛磺酸,肌苷,黄嘌呤,婚姻状况,L.谷氨酰胺对AD的预测也很重要。
    结论:可解释的机器学习在筛查高危AD个体方面显示出希望,并可以进一步确定有针对性的预防的关键预测因素。
    This study aimed to develop novel machine learning models for predicting Alzheimer\'s disease (AD) and identify key factors for targeted prevention.
    We included 1,219, 863, and 482 participants aged 60+ years with only sociodemographic, both sociodemographic and self-reported health, both the former two and blood biomarkers information from Alzheimer\'s Disease Neuroimaging Initiative (ADNI) database. Machine learning models were constructed for predicting the risk of AD for the above three populations. Model performance was evaluated by discrimination, calibration, and clinical usefulness. SHapley Additive exPlanation (SHAP) was applied to identify key predictors of optimal models.
    The mean age was 73.49, 74.52, and 74.29 years for the three populations, respectively. Models with sociodemographic information and models with both sociodemographic and self-reported health information showed modest performance. For models with sociodemographic, self-reported health, and blood biomarker information, their overall performance improved substantially, specifically, logistic regression performed best, with an AUC value of 0.818. Blood biomarkers of ptau protein and plasma neurofilament light, age, blood tau protein, and education level were top five significant predictors. In addition, taurine, inosine, xanthine, marital status, and L.Glutamine also showed importance to AD prediction.
    Interpretable machine learning showed promise in screening high-risk AD individual and could further identify key predictors for targeted prevention.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号