federated learning

联合学习
  • 文章类型: Journal Article
    新兴工业5.0设计在多个拥有不同所有权的地方推广人工智能服务和数据驱动应用程序,这些地方需要特殊的数据保护和隐私考虑,以防止将私人信息泄露给外界。由于这个原因,联邦学习提供了一种改进机器学习模型的方法,而无需在单个制造工厂访问火车数据。在这项研究中,我们为医疗保健智能系统的联合机器学习提供了一个自适应框架。我们的方法考虑了医疗生态系统抽象各个级别的参与方。每个医院都以自适应的方式在内部训练其本地模型,并将其传输到集中式服务器,以实现通用模型优化和通信周期减少。要表示多任务优化问题,我们将数据集分成与设备一样多的子集。每个设备为模型的每个局部迭代选择最有利的子集。在训练数据集上,我们的初步研究证明了该算法能够收敛各种医院和设备计数。通过将联合机器学习方法与先进的深度机器学习模型相结合,我们可以简单而准确地预测人体的多学科癌症疾病。此外,在智能医疗行业5.0中,联合机器学习方法的结果用于验证多学科癌症疾病预测。提出的自适应联邦机器学习方法实现了90.0%,而传统的联邦学习方法达到了87.30%,两者均高于智能医疗行业中以前最先进的癌症疾病预测方法5.0.
    Emerging Industry 5.0 designs promote artificial intelligence services and data-driven applications across multiple places with varying ownership that need special data protection and privacy considerations to prevent the disclosure of private information to outsiders. Due to this, federated learning offers a method for improving machine-learning models without accessing the train data at a single manufacturing facility. We provide a self-adaptive framework for federated machine learning of healthcare intelligent systems in this research. Our method takes into account the participating parties at various levels of healthcare ecosystem abstraction. Each hospital trains its local model internally in a self-adaptive style and transmits it to the centralized server for universal model optimization and communication cycle reduction. To represent a multi-task optimization issue, we split the dataset into as many subsets as devices. Each device selects the most advantageous subset for every local iteration of the model. On a training dataset, our initial study demonstrates the algorithm\'s ability to converge various hospital and device counts. By merging a federated machine-learning approach with advanced deep machine-learning models, we can simply and accurately predict multidisciplinary cancer diseases in the human body. Furthermore, in the smart healthcare industry 5.0, the results of federated machine learning approaches are used to validate multidisciplinary cancer disease prediction. The proposed adaptive federated machine learning methodology achieved 90.0%, while the conventional federated learning approach achieved 87.30%, both of which were higher than the previous state-of-the-art methodologies for cancer disease prediction in the smart healthcare industry 5.0.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    联合学习(FL)是一种分散的机器学习方法,其中各个设备根据其数据计算本地模型。在FL,设备定期与中央服务器共享新训练的更新,而不是提交他们的原始数据。FL的关键特征,包括设备上的培训和聚合,使许多通信领域变得有趣。此外,在第六代(6G)启用的无源光网络(PON)中促进FL的新系统的潜力,为这一领域的整合提供了一个有希望的机会。本文主要讨论FL和PON之间的相互作用,探索有效带宽管理的方法,特别是在解决FL流量引入的复杂性方面。在PON标准中,先进的带宽管理是通过利用动态带宽分配(DBA)算法为一个光网络单元(ONU)分配多个上行授权而提出的。然而,缺乏研究多重赠款分配利用的研究。在本文中,我们通过引入一种新颖的DBA方法来解决此限制,该方法有效地分配用于FL流量生成的PON带宽,并演示了多个授权如何从在执行FL流时实现PON的增强容量中受益。本研究中进行的仿真表明,所提出的解决方案在几个网络性能指标方面优于最先进的解决方案,特别是在减少上游延迟。这一改进为实现实时数据密集型服务提供了巨大的希望,这些服务将成为6G环境的关键组件。此外,我们的讨论概述了将FL和PON整合为能够支持6G网络的运营现实的潜力。
    Federated Learning (FL) is a decentralized machine learning method in which individual devices compute local models based on their data. In FL, devices periodically share newly trained updates with the central server, rather than submitting their raw data. The key characteristics of FL, including on-device training and aggregation, make it interesting for many communication domains. Moreover, the potential of new systems facilitating FL in sixth generation (6G) enabled Passive Optical Networks (PON), presents a promising opportunity for integration within this domain. This article focuses on the interaction between FL and PON, exploring approaches for effective bandwidth management, particularly in addressing the complexity introduced by FL traffic. In the PON standard, advanced bandwidth management is proposed by allocating multiple upstream grants utilizing the Dynamic Bandwidth Allocation (DBA) algorithm to be allocated for an Optical Network Unit (ONU). However, there is a lack of research on studying the utilization of multiple grant allocation. In this paper, we address this limitation by introducing a novel DBA approach that efficiently allocates PON bandwidth for FL traffic generation and demonstrates how multiple grants can benefit from the enhanced capacity of implementing PON in carrying out FL flows. Simulations conducted in this study show that the proposed solution outperforms state-of-the-art solutions in several network performance metrics, particularly in reducing upstream delay. This improvement holds great promise for enabling real-time data-intensive services that will be key components of 6G environments. Furthermore, our discussion outlines the potential for the integration of FL and PON as an operational reality capable of supporting 6G networking.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    联合学习是保护数据隐私和安全的有效方法。使机器学习能够在分布式环境中发生并促进其发展。然而,一个迫切需要解决的问题是如何鼓励客户积极参与联合学习。Shapley值,合作博弈论中的经典概念,已用于机器学习服务中的数据评估。然而,现有的基于Shapley值的数值评估方案是不切实际的,因为他们需要额外的模型训练,导致通信开销增加。此外,参与者数据可能表现出非IID特征,对评估参与者的贡献构成了重大挑战。非IID数据极大地影响了全局模型的准确性,削弱了参与者的边际效应,并导致参与者的贡献测量结果被低估。当前的工作通常忽略了异质性对模型聚合的影响。本文提出了一种公平的联邦学习贡献度量方案,该方案解决了对其他模型计算的需求。通过引入新的聚集权重,它提高了贡献测量的准确性。在MNIST和时尚MNIST数据集上的实验表明,该方法可以准确计算参与者的贡献。与现有的基线算法相比,模型精度明显提高,具有类似的时间成本。
    Federated learning is an effective approach for preserving data privacy and security, enabling machine learning to occur in a distributed environment and promoting its development. However, an urgent problem that needs to be addressed is how to encourage active client participation in federated learning. The Shapley value, a classical concept in cooperative game theory, has been utilized for data valuation in machine learning services. Nevertheless, existing numerical evaluation schemes based on the Shapley value are impractical, as they necessitate additional model training, leading to increased communication overhead. Moreover, participants\' data may exhibit Non-IID characteristics, posing a significant challenge to evaluating participant contributions. Non-IID data have greatly affected the accuracy of the global model, weakened the marginal effect of the participants, and led to the underestimated contribution measurement results of the participants. Current work often overlooks the impact of heterogeneity on model aggregation. This paper presents a fair federated learning contribution measurement scheme that addresses the need for additional model computations. By introducing a novel aggregation weight, it enhances the accuracy of the contribution measurement. Experiments on the MNIST and Fashion MNIST dataset show that the proposed method can accurately compute the contributions of participants. Compared to existing baseline algorithms, the model accuracy is significantly improved, with a similar time cost.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • DOI:
    文章类型: Journal Article
    在联合学习中,一个被广泛认可的困难来自客户之间的统计异质性:本地数据集通常来自不同但并非完全无关的概率分布,个性化是,因此,从每个人的角度来看,实现最佳结果是必要的。在本文中,我们展示了个性化联合学习的超额风险如何使用平滑,从极大极小的角度来看,强凸损失取决于数据异质性,重点是FedAvg算法(McMahan等人,,2017年)和纯本地培训(即,客户在没有任何通信的情况下解决其本地数据集上的经验风险最小化问题)。我们的主要结果揭示了这两种用于联合学习的基准算法之间的近似替代方案:当数据异质性较小时,前一种算法在一组实例上是minimax速率最优的,而后者在数据异质性较大时是极小极大速率最优的,阈值是尖锐的,直到一个常数。作为一种暗示,我们的结果表明,从最坏的情况来看,在两种基线算法之间进行选择的二分策略是速率最优的。另一个含义是,流行的FedAvg遵循局部微调策略在其他规律性条件下也是minimax最优的。我们的分析依赖于算法稳定性的新概念,该概念考虑了联合学习的性质。
    A widely recognized difficulty in federated learning arises from the statistical heterogeneity among clients: local datasets often originate from distinct yet not entirely unrelated probability distributions, and personalization is, therefore, necessary to achieve optimal results from each individual\'s perspective. In this paper, we show how the excess risks of personalized federated learning using a smooth, strongly convex loss depend on data heterogeneity from a minimax point of view, with a focus on the FedAvg algorithm (McMahan et al., 2017) and pure local training (i.e., clients solve empirical risk minimization problems on their local datasets without any communication). Our main result reveals an approximate alternative between these two baseline algorithms for federated learning: the former algorithm is minimax rate optimal over a collection of instances when data heterogeneity is small, whereas the latter is minimax rate optimal when data heterogeneity is large, and the threshold is sharp up to a constant. As an implication, our results show that from a worst-case point of view, a dichotomous strategy that makes a choice between the two baseline algorithms is rate-optimal. Another implication is that the popular FedAvg following by local fine tuning strategy is also minimax optimal under additional regularity conditions. Our analysis relies on a new notion of algorithmic stability that takes into account the nature of federated learning.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    图神经网络(GNN)广泛应用于推荐系统中,但是传统的集中式方法会引起隐私问题。为了解决这个问题,我们引入了一个基于GNN的隐私保护建议的联合框架。该框架允许使用本地用户数据对GNN模型进行分布式训练。每个客户端使用自己的用户项图训练GNN,并将梯度上传到中央服务器进行聚合。为了克服有限的数据,我们建议使用软件防护扩展(SGX)和本地差分隐私(LDP)扩展本地图。SGX计算子图交换和扩展的节点交叉点,而本地差异隐私确保隐私。此外,我们引入了原型网络(PN)和模型无关元学习(MAML)的个性化方法来处理数据异质性。这增强了联邦元学习器的编码能力,实现精确微调和快速适应不同的客户端图数据。我们利用SGX和本地差分隐私来实现安全的参数共享和防御恶意服务器。跨六个数据集的综合实验证明了我们的方法优于基于GNN的集中式推荐,同时保护用户隐私。
    Graph neural networks (GNN) are widely used in recommendation systems, but traditional centralized methods raise privacy concerns. To address this, we introduce a federated framework for privacy-preserving GNN-based recommendations. This framework allows distributed training of GNN models using local user data. Each client trains a GNN using its own user-item graph and uploads gradients to a central server for aggregation. To overcome limited data, we propose expanding local graphs using Software Guard Extension (SGX) and Local Differential Privacy (LDP). SGX computes node intersections for subgraph exchange and expansion, while local differential privacy ensures privacy. Additionally, we introduce a personalized approach with Prototype Networks (PN) and Model-Agnostic Meta-Learning (MAML) to handle data heterogeneity. This enhances the encoding abilities of the federated meta-learner, enabling precise fine-tuning and quick adaptation to diverse client graph data. We leverage SGX and local differential privacy for secure parameter sharing and defense against malicious servers. Comprehensive experiments across six datasets demonstrate our method\'s superiority over centralized GNN-based recommendations, while preserving user privacy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    精神疾病的发病率,比如自杀意念和抑郁,正在增加,这凸显了对早期检测方法的迫切需要。人们对使用自然语言处理(NLP)模型来分析患者的文本数据越来越感兴趣。但出于隐私考虑,出于研究目的访问患者数据可能会很有挑战性。联合学习(FL)是一种有前途的方法,可以平衡集中学习的需求与数据所有权敏感性。在这项研究中,我们使用模拟的多语言数据集检查FL模型在检测抑郁症方面的有效性.我们分析了五种不同语言的社交媒体帖子,样本量不同。我们的发现表明,FL在大多数情况下都能实现出色的性能,同时为独立和非独立的客户端分区维护客户端的隐私。
    The incidences of mental health illnesses, such as suicidal ideation and depression, are increasing, which highlights the urgent need for early detection methods. There is a growing interest in using natural language processing (NLP) models to analyze textual data from patients, but accessing patients\' data for research purposes can be challenging due to privacy concerns. Federated learning (FL) is a promising approach that can balance the need for centralized learning with data ownership sensitivity. In this study, we examine the effectiveness of FL models in detecting depression by using a simulated multilingual dataset. We analyzed social media posts in five different languages with varying sample sizes. Our findings indicate that FL achieves strong performance in most cases while maintaining clients\' privacy for both independent and non-independent client partitioning.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    人工智能(AI)通过利用数据来构建可以为临床工作流程提供信息的模型,显示出改善医疗保健的潜力。然而,需要访问大量不同的数据来开发健壮的可概括模型。由于法律原因,跨机构的数据共享并不总是可行的,安全,和隐私问题。联合学习(FL)允许对AI模型进行多机构培训,避免数据共享,尽管有不同的安全和隐私问题。具体来说,在FL期间交换的见解可能会泄露有关机构数据的信息。此外,当在执行计算的实体之间存在有限的信任时,FL可能引入问题。随着FL在医疗保健中的应用越来越多,必须阐明潜在的风险。因此,我们在这项工作中总结了隐私保护的FL文献,特别是在医疗保健方面。我们提请注意威胁并审查缓解方法。我们预计这篇评论将成为医疗保健研究人员关于FL安全和隐私的指南。
    Artificial intelligence (AI) shows potential to improve health care by leveraging data to build models that can inform clinical workflows. However, access to large quantities of diverse data is needed to develop robust generalizable models. Data sharing across institutions is not always feasible due to legal, security, and privacy concerns. Federated learning (FL) allows for multi-institutional training of AI models, obviating data sharing, albeit with different security and privacy concerns. Specifically, insights exchanged during FL can leak information about institutional data. In addition, FL can introduce issues when there is limited trust among the entities performing the compute. With the growing adoption of FL in health care, it is imperative to elucidate the potential risks. We thus summarize privacy-preserving FL literature in this work with special regard to health care. We draw attention to threats and review mitigation approaches. We anticipate this review to become a health-care researcher\'s guide to security and privacy in FL.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    高患病率,发病率和死亡率,慢性阻塞性肺疾病(COPD)的疾病异质性导致来自不同医疗单位患者就诊的分散数据。整合分散的数据进行分析建模的巨大成本,以及对患者隐私保护的法律需求导致数据孤岛的出现。
    在保护患者隐私的前提下,整合不同医疗单位患者的零散数据进行高质量建模,有利于促进数字健康的发展。基于此,我们使用FedAvg开发了一种分布式COPD疾病诊断系统,称为COPD平均联合学习(COPD_AVG_FL).
    首先,为了构建COPD_AVG_FL,从现实世界中收集COPD患者的临床数据,并进行数据预处理以清理不正确的数据,离群值样本和缺失值。然后,经典的联邦学习体系结构被设计为COPD_AVG_FL。最后,为了评估已建立的COPD_AVG_FL系统,我们开发了集中式机器学习(CML)。
    我们的结果表明,在COPD_AVG_FL的协助下,绝对改善率为13.4%(准确度),13.3%(精度),12.8%(召回),测试数据的13.1%(F1-Score)和12.9%(AUC),分别。模型训练和原始训练数据之间的解耦保护了患者的隐私,并有助于安全地集成来自不同医疗单位的更多COPD数据,以生成更全面的模型COPD_AVG_FL。这种方法促进了COPD医学的明智信息技术在现实临床世界中的落地。我们模型的代码将在https://github.com/Cczhh/COPD_AVG_FL/tree/master上提供。
    UNASSIGNED: The high prevalence, morbidity and mortality, and disease heterogeneity of chronic obstructive pulmonary disease (COPD) result in the scattered data derived from patient visits in different medical units. The huge cost of integrating the scattered data for analysis and modeling, as well as the legal demand for patient privacy protection lead to the emergence of data island.
    UNASSIGNED: On the premise of protecting patient privacy, integrating scattered data of patients from different medical units for high-quality modeling is beneficial to promoting the development of digital health. Based on this, we develop a distributed COPD disease diagnosis system termed COPD average federated learning (COPD_AVG_FL) using FedAvg.
    UNASSIGNED: First, to build the COPD_AVG_FL, the clinical data of COPD patients from the real world is collected and the data pre-processing is performed to clean the incorrect data, outlier samples and missing values. Then, a classical federated learning architecture is designed as COPD_AVG_FL. Finally, to evaluate the established COPD_AVG_FL system, we develop Centralized Machine Learning (CML).
    UNASSIGNED: Our results suggest that, with the assistance of COPD_AVG_FL, the absolute improvement rates are 13.4% (accuracy), 13.3% (precision), 12.8% (recall), 13.1% (F1-Score) and 12.9% (AUC) on the test data, respectively. The decoupling between model training and raw training data protects the patients\' privacy, and helps to securely integrate more COPD data from different medical units to generate a more comprehensive model COPD_AVG_FL. This approach promotes the landing of wise information technology of medicine for COPD in the real clinical world. Code for our model will be made available at https://github.com/Cczhh/COPD_AVG_FL/tree/master.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基于深度学习的方法在磁共振(MR)图像重建领域取得了令人鼓舞的性能。然而,构建强大而健壮的深度学习模型需要从多个中心收集大量多样的数据集。这引起了人们对道德和数据隐私的担忧。最近,联合学习已经成为一种有希望的解决方案,实现多中心数据的利用,而无需机构之间的数据传输。尽管有潜力,由于来自不同中心的数据的高度异质性,现有的联邦学习方法面临着挑战。基于简单平均的聚合方法,通常用于组合客户端的信息,表现出有限的重构和泛化能力。在本文中,我们提出了一个基于模型的联合学习框架(ModFed)来解决这些挑战。ModFed有三大贡献:(1)不同于现有的数据驱动的联邦学习方法,ModFed设计了基于注意力辅助模型的神经网络,可以缓解每个客户端对大量数据的需求;(2)解决数据异构问题,ModFed提出了一种自适应动态聚合方案,可以提高训练后的神经网络模型的泛化能力和鲁棒性;(3)ModFed结合了空间拉普拉斯注意机制和个性化的客户端损失正则化,以捕获详细信息,进行准确的图像重建。在三个体内数据集上评估了所提出的ModFed的有效性。实验结果表明,与现有的六种最先进的联合学习方法相比,ModFed通过增加的泛化能力实现了更好的MR图像重建性能。代码将在https://github.com/ternencewu123/ModFed上提供。
    Deep learning-based methods have achieved encouraging performances in the field of Magnetic Resonance (MR) image reconstruction. Nevertheless, building powerful and robust deep learning models requires collecting large and diverse datasets from multiple centers. This raises concerns about ethics and data privacy. Recently, federated learning has emerged as a promising solution, enabling the utilization of multi-center data without the need for data transfer between institutions. Despite its potential, existing federated learning methods face challenges due to the high heterogeneity of data from different centers. Aggregation methods based on simple averaging, which are commonly used to combine the client\'s information, have shown limited reconstruction and generalization capabilities. In this paper, we propose a Model-based Federated learning framework (ModFed) to address these challenges. ModFed has three major contributions: (1) Different from existing data-driven federated learning methods, ModFed designs attention-assisted model-based neural networks that can alleviate the need for large amounts of data on each client; (2) To address the data heterogeneity issue, ModFed proposes an adaptive dynamic aggregation scheme, which can improve the generalization capability and robustness of the trained neural network models; (3) ModFed incorporates a spatial Laplacian attention mechanism and a personalized client-side loss regularization to capture the detailed information for accurate image reconstruction. The effectiveness of the proposed ModFed is evaluated on three in-vivo datasets. Experimental results show that when compared to six existing state-of-the-art federated learning approaches, ModFed achieves better MR image reconstruction performance with increased generalization capability. Codes will be made available at https://github.com/ternencewu123/ModFed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    医疗物联网(IoMT)显著推进了医疗保健,但它也带来了关键的安全挑战。传统的安全解决方案很难跟上IoMT系统的动态和互连特性。基于机器学习(ML)的入侵检测系统(IDS)已越来越多地用于应对网络攻击,但由于单点故障(SPoF),集中式机器学习方法会带来隐私风险。联合学习(FL)作为一种有前途的解决方案出现,直接在终端设备上启用模型更新,而无需与中央服务器共享私有数据。这项研究介绍了BFLIDS,区块链授权的基于联合学习的IDS,旨在增强IoMT网络中的安全性和入侵检测。我们的方法利用区块链来保护交易记录,FL通过在本地训练模型来维护数据隐私,用于分散存储的IPFS,和MongoDB实现高效的数据管理。以太坊智能合约(SC)监督和保护系统内的所有交互和交易。我们使用Kullback-Leibler散度估计和自适应权重计算修改了FedAvg算法,以提高模型的准确性和对抗攻击的鲁棒性。对于分类,我们在Edge-IIoTSet和TON-IoT数据集上实现了基于自适应最大池化的卷积神经网络(CNN)和改进的双向长短期记忆(BiLSTM),并具有注意力和剩余连接。我们实现了97.43%的准确率(对于CNN和Edge-IIoTSet),96.02%(对于BiLSTM和Edge-IIoTSet),98.21%(对于CNN和TON-IoT),和97.42%(对于BiLSTM和TON-IoT)在FL场景中,与集中方法竞争。拟议的BFLIDS有效地检测入侵,增强IoMT网络的安全性和隐私性。
    The Internet of Medical Things (IoMT) has significantly advanced healthcare, but it has also brought about critical security challenges. Traditional security solutions struggle to keep pace with the dynamic and interconnected nature of IoMT systems. Machine learning (ML)-based Intrusion Detection Systems (IDS) have been increasingly adopted to counter cyberattacks, but centralized ML approaches pose privacy risks due to the single points of failure (SPoFs). Federated Learning (FL) emerges as a promising solution, enabling model updates directly on end devices without sharing private data with a central server. This study introduces the BFLIDS, a Blockchain-empowered Federated Learning-based IDS designed to enhance security and intrusion detection in IoMT networks. Our approach leverages blockchain to secure transaction records, FL to maintain data privacy by training models locally, IPFS for decentralized storage, and MongoDB for efficient data management. Ethereum smart contracts (SCs) oversee and secure all interactions and transactions within the system. We modified the FedAvg algorithm with the Kullback-Leibler divergence estimation and adaptive weight calculation to boost model accuracy and robustness against adversarial attacks. For classification, we implemented an Adaptive Max Pooling-based Convolutional Neural Network (CNN) and a modified Bidirectional Long Short-Term Memory (BiLSTM) with attention and residual connections on Edge-IIoTSet and TON-IoT datasets. We achieved accuracies of 97.43% (for CNNs and Edge-IIoTSet), 96.02% (for BiLSTM and Edge-IIoTSet), 98.21% (for CNNs and TON-IoT), and 97.42% (for BiLSTM and TON-IoT) in FL scenarios, which are competitive with centralized methods. The proposed BFLIDS effectively detects intrusions, enhancing the security and privacy of IoMT networks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号