Data aggregation

数据聚合
  • 文章类型: Journal Article
    了解军事暴露的健康结果对退伍军人至关重要,他们的医疗团队,和国家领导人。大约43%的退伍军人向其VA提供者报告了军事接触问题。了解环境暴露对健康的因果影响是一项复杂的暴露科学任务,通常需要解释多个数据源;特别是当暴露途径和多重暴露相互作用不明确时,复杂和新兴的军事服务风险也是如此。因此,有必要对来自不同数据源的临床上有意义的暴露指标进行标准化,以指导临床医生和研究人员使用一致的模型来调查和传达暴露风险概况.跨数据库关联暴露(LEAD)框架提供了一个统一模型,用于表征来自不同暴露数据库的暴露,重点是提供临床相关的暴露指标。通过比较不同的军事暴露数据源来证明LEAD的应用:退伍军人军事职业和环境暴露评估工具(VMOAT),个人纵向暴露记录(ILER)数据库,还有一个军事事件报告数据库,爆炸物处置信息管理系统(EODIMS)。这种评估军事暴露的内聚方法利用已建立的信息和新的数据来源,并有可能影响军事暴露数据如何整合到暴露卫生保健和调查模型中。
    Understanding the health outcomes of military exposures is of critical importance for Veterans, their health care team, and national leaders. Approximately 43% of Veterans report military exposure concerns to their VA providers. Understanding the causal influences of environmental exposures on health is a complex exposure science task and often requires interpreting multiple data sources; particularly when exposure pathways and multi-exposure interactions are ill-defined, as is the case for complex and emerging military service exposures. Thus, there is a need to standardize clinically meaningful exposure metrics from different data sources to guide clinicians and researchers with a consistent model for investigating and communicating exposure risk profiles. The Linked Exposures Across Databases (LEAD) framework provides a unifying model for characterizing exposures from different exposure databases with a focus on providing clinically relevant exposure metrics. Application of LEAD is demonstrated through comparison of different military exposure data sources: Veteran Military Occupational and Environmental Exposure Assessment Tool (VMOAT), Individual Longitudinal Exposure Record (ILER) database, and a military incident report database, the Explosive Ordnance Disposal Information Management System (EODIMS). This cohesive method for evaluating military exposures leverages established information with new sources of data and has the potential to influence how military exposure data is integrated into exposure health care and investigational models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    由于水下环境的独特性,传统的数据聚合方案面临许多挑战。大多数现有的数据聚合解决方案没有充分考虑节点的可信性,这可能导致在聚合过程中包含恶意节点发送的伪造数据,从而影响聚合结果的准确性。此外,由于水下环境的动态变化,当前的解决方案通常缺乏足够的灵活性来处理节点移动和网络拓扑变化等情况,影响数据传输的稳定性和可靠性。为了解决上述问题,提出了一种基于信任机制的安全数据聚合算法。通过根据节点信任值和传输距离动态调整节点切片的数量和大小,该算法有效降低了网络通信开销,提高了数据聚合的准确性。由于节点切片数量的可变性,即使攻击者拦截了一些切片,他们很难重建完整的数据,从而确保数据安全。
    Due to the uniqueness of the underwater environment, traditional data aggregation schemes face many challenges. Most existing data aggregation solutions do not fully consider node trustworthiness, which may result in the inclusion of falsified data sent by malicious nodes during the aggregation process, thereby affecting the accuracy of the aggregated results. Additionally, because of the dynamically changing nature of the underwater environment, current solutions often lack sufficient flexibility to handle situations such as node movement and network topology changes, significantly impacting the stability and reliability of data transmission. To address the aforementioned issues, this paper proposes a secure data aggregation algorithm based on a trust mechanism. By dynamically adjusting the number and size of node slices based on node trust values and transmission distances, the proposed algorithm effectively reduces network communication overhead and improves the accuracy of data aggregation. Due to the variability in the number of node slices, even if attackers intercept some slices, it is difficult for them to reconstruct the complete data, thereby ensuring data security.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    METHODS: The potential for exposure to indoor radon varies dramatically across British Columbia (BC) due to varied geology. Individuals may struggle to understand their exposure risk and agencies may struggle to understand the value of population-level programs and policies to mitigate risk.
    METHODS: The BC Centre for Disease Control (BCCDC) established the BC Radon Data Repository (BCRDR) to facilitate radon research, public awareness, and action in the province. The BCRDR aggregates indoor radon measurements collected by government agencies, industry professionals and organizations, and research and advocacy groups. Participation was formalized with a data sharing agreement, which outlines how the BCCDC anonymizes and manages the shared data integrated into the BCRDR.
    RESULTS: The BCRDR currently holds 38,733 measurements from 18 data contributors. The repository continues to grow with new measurements from existing contributors and the addition of new contributors. A prominent use of the BCRDR was to create the online, interactive BC Radon Map, which includes regional concentration summaries, risk interpretation messaging, and health promotion information. Anonymized BCRDR data are also available for external release upon request.
    CONCLUSIONS: The BCCDC leverages existing radon measurement programs to create a large and integrated database with wide geographic coverage. The development and application of the BCRDR informs public health research and action beyond the BCCDC, and the repository can serve as a model for other regional or national initiatives.
    RéSUMé: LIEU: Le potentiel d’exposition au radon à l’intérieur des bâtiments varie beaucoup d’une région à l’autre de la Colombie-Britannique en raison de la géologie variée. Les particuliers peuvent avoir du mal à comprendre leur risque d’exposition, et les organismes, à comprendre l’utilité des programmes et des politiques populationnels pour atténuer le risque. INTERVENTION: Le BC Centre for Disease Control (« le Centre ») a créé un organe d’archivage, le BC Radon Data Repository (BCRDR), pour faciliter la recherche, l’information, la sensibilisation du public et l’action liées au radon dans la province. Le BCRDR totalise les relevés du radon à l’intérieur des bâtiments pris par les organismes gouvernementaux, les professionnels et les organismes de l’industrie, ainsi que les groupes de recherche et de revendication. La participation est officialisée par un accord de partage de données qui décrit comment le Centre anonymise et gère les données communes du BCRDR. RéSULTATS: Le BCRDR contient actuellement 38 733 relevés de 18 contributeurs de données. Il continue de croître, avec de nouveaux relevés venant de contributeurs existants et l’ajout de nouveaux contributeurs. Il a servi, entre autres, à créer une carte du radon interactive en ligne pour la Colombie-Britannique, avec des résumés des concentrations régionales, des messages d’interprétation du risque et des informations de promotion de la santé. Sur demande, les données anonymisées du BCRDR sont également disponibles pour diffusion externe. CONSéQUENCES: Le Centre a exploité les programmes de prise de relevés du radon existants pour créer une grande base de données intégrée ayant une vaste couverture géographique. Le développement et les applications du BCRDR éclairent la recherche et l’action en santé publique au-delà du Centre, et l’organe d’archivage peut servir de modèle pour d’autres initiatives régionales ou nationales.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    数据聚合在传感器网络中对于有效的数据收集起着至关重要的作用。然而,现有算法中传感器之间初始能量水平一致的假设在实际生产应用中是不现实的。初始能量水平的这种差异显着影响传感器网络中的数据聚合。为了解决这个问题,我们提出了具有不同初始能量的数据聚合(DADIE),一种旨在增强节能的新算法,隐私保护效率,并降低初始能量节点变化的传感器网络中的节点死亡率。DADIE在形成网络拓扑时考虑了节点之间的传输距离及其初始能量水平,同时也限制了子节点的数量。此外,DADIE在每轮数据传输之前重建聚合树。这允许更靠近具有更高初始能量的接收端的节点承担更多的数据聚合和传输任务,同时限制能量消耗。因此,DADIE有效降低了节点死亡率,提高了整个网络的数据传输效率。加强网络安全,DADIE在数据传输之前在传输节点之间建立安全传输通道,它在网络中采用了切片和混合技术。我们的实验仿真表明,所提出的DADIE算法有效地解决了初始能量节点变化的传感器网络中的数据聚合挑战。它实现了5-20%的通信开销和能耗降低,安全性提高10-20%,节点死亡率比现有算法低10-30%。
    Data aggregation plays a critical role in sensor networks for efficient data collection. However, the assumption of uniform initial energy levels among sensors in existing algorithms is unrealistic in practical production applications. This discrepancy in initial energy levels significantly impacts data aggregation in sensor networks. To address this issue, we propose Data Aggregation with Different Initial Energy (DADIE), a novel algorithm that aims to enhance energy-saving, privacy-preserving efficiency, and reduce node death rates in sensor networks with varying initial energy nodes. DADIE considers the transmission distance between nodes and their initial energy levels when forming the network topology, while also limiting the number of child nodes. Furthermore, DADIE reconstructs the aggregation tree before each round of data transmission. This allows nodes closer to the receiving end with higher initial energy to undertake more data aggregation and transmission tasks while limiting energy consumption. As a result, DADIE effectively reduces the node death rate and improves the efficiency of data transmission throughout the network. To enhance network security, DADIE establishes secure transmission channels between transmission nodes prior to data transmission, and it employs slice-and-mix technology within the network. Our experimental simulations demonstrate that the proposed DADIE algorithm effectively resolves the data aggregation challenges in sensor networks with varying initial energy nodes. It achieves 5-20% lower communication overhead and energy consumption, 10-20% higher security, and 10-30% lower node mortality than existing algorithms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    物联网(IoT)是智能设备的网络,尤其是在基于医疗保健的系统中。医疗物联网(IoMT)使用可穿戴传感器收集数据并传输到中央存储库。医疗保健数据的安全性和隐私性是一项具有挑战性的任务。该研究的目的是提供一种安全的数据共享机制。现有的研究提供了安全的数据共享方案,但是在将患者身份隐藏在交换的消息中以将数据上传到中央存储库上方面仍然存在限制。本文提出了一种安全的聚合数据收集和传输(SADCT),它为患者的移动设备和中间雾节点的身份提供了匿名性。我们的系统包括一个经过身份验证的服务器,用于通过保存安全凭据进行节点注册和身份验证。该方案提出了一种新颖的移动设备上的数据聚合算法,和雾节点的数据提取算法。这项工作通过NS中的广泛模拟以及安全性分析得到了验证。结果证明了SADCT在能源消耗方面的优势,storage,通信,和计算成本。
    The Internet of Things (IoT) is a network of intelligent devices especially in healthcare-based systems. Internet of Medical Things (IoMT) uses wearable sensors to collect data and transmit to central repositories. The security and privacy of healthcare data is a challenging task. The aim of the study is to provide a secure data sharing mechanism. The existing studies provide secure data sharing schemes but still have limitations in terms of hiding the patient identify in the messages exchanged to upload the data on central repositories. This paper presents a Secure Aggregated Data Collection and Transmission (SADCT) that provides anonymity for the identities of patient\'s mobile device and the intermediate fog nodes. Our system involves an authenticated server for node registration and authentication by saving security credentials. The proposed scheme presents the novel data aggregation algorithm at the mobile device, and the data extraction algorithm at the fog node. The work is validated through extensive simulations in NS along with a security analysis. Results prove the supremacy of SADCT in terms of energy consumption, storage, communication, and computational costs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    表型技术的进步使植物科学研究人员能够从他们的实验中收集大量信息,尤其是那些评估多种基因型的。为了充分利用这些复杂且通常异构的数据集(即格式和结构不同的数据集),科学家必须在数据处理上投入大量时间,和数据管理已经成为下游应用的一个相当大的障碍。这里,我们提出了一个管道来加强数据收集,processing,和植物科学研究的管理,包括两个新开发的开源项目。第一,叫做AgTC,是一系列编程函数,可生成逗号分隔值文件模板,以使用基于实验室的计算机或移动设备以标准格式收集数据。第二系列函数,AgETL,执行Extract-Transform-Load(ETL)数据集成过程的步骤,在该过程中,从异类格式的文件中提取数据,转换为符合标准标准,并加载到数据库中。在那里,数据被存储并可以被访问以用于与数据分析相关的过程,包括通过基于Web的工具进行动态数据可视化。AgTC和AgETL都可以灵活地应用于植物科学实验,而无需领域科学家的编程知识。它们的函数在JupyterNotebook上执行,基于浏览器的交互式开发环境。此外,所有参数都可以从以人类可读的YAML格式编写的中央配置文件轻松自定义。使用来自大学和非政府组织(NGO)设置的研究实验室的三个实验作为测试案例,我们展示了AgTC和AgETL在简化植物科学中从数据收集到分析的关键步骤的实用性。
    Advancements in phenotyping technology have enabled plant science researchers to gather large volumes of information from their experiments, especially those that evaluate multiple genotypes. To fully leverage these complex and often heterogeneous data sets (i.e. those that differ in format and structure), scientists must invest considerable time in data processing, and data management has emerged as a considerable barrier for downstream application. Here, we propose a pipeline to enhance data collection, processing, and management from plant science studies comprising of two newly developed open-source programs. The first, called AgTC, is a series of programming functions that generates comma-separated values file templates to collect data in a standard format using either a lab-based computer or a mobile device. The second series of functions, AgETL, executes steps for an Extract-Transform-Load (ETL) data integration process where data are extracted from heterogeneously formatted files, transformed to meet standard criteria, and loaded into a database. There, data are stored and can be accessed for data analysis-related processes, including dynamic data visualization through web-based tools. Both AgTC and AgETL are flexible for application across plant science experiments without programming knowledge on the part of the domain scientist, and their functions are executed on Jupyter Notebook, a browser-based interactive development environment. Additionally, all parameters are easily customized from central configuration files written in the human-readable YAML format. Using three experiments from research laboratories in university and non-government organization (NGO) settings as test cases, we demonstrate the utility of AgTC and AgETL to streamline critical steps from data collection to analysis in the plant sciences.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Preprint
    基因共表达网络(GCN)描述表达的基因之间的关系,这些基因是维持细胞同一性和体内平衡的关键。然而,典型的RNA-seq实验的小样本量比基因数量少几个数量级,太低而不能可靠地推断GCN。recount3,一个公开可用的数据集,由316,443个统一处理的人类RNA-seq样本组成,提供了一个机会来提高准确网络重建的能力,并从所产生的网络中获得生物学洞察力。
    我们比较了替代聚合策略,以确定通过数据聚合进行GCN推断的最佳工作流程,并推断了三个共识网络:通用网络,一个非癌症网络,除了27个组织背景特异性网络外,还有一个癌症网络。来自我们共识网络的中心网络基因被丰富用于进化受限的基因和普遍存在的生物学途径,而中心上下文特异性网络基因包括组织特异性转录因子,基于中心的因式分解导致相关组织上下文的聚类.我们发现,与从聚合数据推断的上下文特定网络相对应的注释在性状遗传力方面得到了丰富,超出了已知的功能基因组注释,并且当我们在更多数量的样本上进行聚合时,它们的特征遗传力得到了丰富。
    本研究概述了通过数据聚合进行网络GCN推断和评估的最佳实践。我们建议在汇总之前估计和回归每个数据集中的混杂因素,并优先考虑GCN重建的大样本量研究。推断特定于上下文的网络的统计能力增加,可以推导出变体注释,这些注释丰富了与上下文无关的功能基因组注释的一致性性状遗传力。虽然我们观察到数据聚合严格增加了保留的对数可能性,我们注意到边际改善在递减。未来的方向旨在用于估计混杂因素和整合来自Hi-C和ChIP-seq等模态的正交信息的替代方法可以进一步改善GCN推断。
    UNASSIGNED: Gene co-expression networks (GCNs) describe relationships among expressed genes key to maintaining cellular identity and homeostasis. However, the small sample size of typical RNA-seq experiments which is several orders of magnitude fewer than the number of genes is too low to infer GCNs reliably. recount3, a publicly available dataset comprised of 316,443 uniformly processed human RNA-seq samples, provides an opportunity to improve power for accurate network reconstruction and obtain biological insight from the resulting networks.
    UNASSIGNED: We compared alternate aggregation strategies to identify an optimal workflow for GCN inference by data aggregation and inferred three consensus networks: a universal network, a non-cancer network, and a cancer network in addition to 27 tissue context-specific networks. Central network genes from our consensus networks were enriched for evolutionarily constrained genes and ubiquitous biological pathways, whereas central context-specific network genes included tissue-specific transcription factors and factorization based on the hubs led to clustering of related tissue contexts. We discovered that annotations corresponding to context-specific networks inferred from aggregated data were enriched for trait heritability beyond known functional genomic annotations and were significantly more enriched when we aggregated over a larger number of samples.
    UNASSIGNED: This study outlines best practices for network GCN inference and evaluation by data aggregation. We recommend estimating and regressing confounders in each data set before aggregation and prioritizing large sample size studies for GCN reconstruction. Increased statistical power in inferring context-specific networks enabled the derivation of variant annotations that were enriched for concordant trait heritability independent of functional genomic annotations that are context-agnostic. While we observed strictly increasing held-out log-likelihood with data aggregation, we noted diminishing marginal improvements. Future directions aimed at alternate methods for estimating confounders and integrating orthogonal information from modalities such as Hi-C and ChIP-seq can further improve GCN inference.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    受威胁的物种监测可以产生大量的声音和视觉记录,必须对其进行搜索以进行动物检测。数据编码对人类来说非常耗时,尽管机器算法正在成为解决这一任务的有用工具,他们也需要大量的已知检测来进行训练。公民科学家经常通过众包招募来提供帮助。然而,他们编码的结果可能很难解释,因为公民科学家缺乏全面的培训,通常每个人只编码完整数据集的一小部分。公民科学家之间的能力可能有所不同,但是不知道数据集的真相,很难确定哪些公民科学家最有能力。我们使用了定量认知模型,文化共识理论,分析来自澳大利亚青蛙录音的众包分析的经验和模拟数据。数百名公民科学家被问及在1260个简短的录音中是否出现了9种青蛙的叫声,尽管大多数人只编码了这些录音的一小部分。通过建模,估计了公民科学家队列和录音的特征。然后,我们将模型的输出与记录的专家编码进行了比较,发现队列的共识与专家评估之间存在一致性。这一发现增加了证据,表明众包分析可以用来理解大规模数据集,即使数据集的真相是未知的。基于模型的分析提供了一个有前途的工具,可以在投入专家时间和资源之前筛选大型数据集。
    Threatened species monitoring can produce enormous quantities of acoustic and visual recordings which must be searched for animal detections. Data coding is extremely time-consuming for humans and even though machine algorithms are emerging as useful tools to tackle this task, they too require large amounts of known detections for training. Citizen scientists are often recruited via crowd-sourcing to assist. However, the results of their coding can be difficult to interpret because citizen scientists lack comprehensive training and typically each codes only a small fraction of the full dataset. Competence may vary between citizen scientists, but without knowing the ground truth of the dataset, it is difficult to identify which citizen scientists are most competent. We used a quantitative cognitive model, cultural consensus theory, to analyze both empirical and simulated data from a crowdsourced analysis of audio recordings of Australian frogs. Several hundred citizen scientists were asked whether the calls of nine frog species were present on 1260 brief audio recordings, though most only coded a fraction of these recordings. Through modeling, characteristics of both the citizen scientist cohort and the recordings were estimated. We then compared the model\'s output to expert coding of the recordings and found agreement between the cohort\'s consensus and the expert evaluation. This finding adds to the evidence that crowdsourced analyses can be utilized to understand large-scale datasets, even when the ground truth of the dataset is unknown. The model-based analysis provides a promising tool to screen large datasets prior to investing expert time and resources.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    可穿戴体域网是现代电子医疗系统的关键组成部分(例如,远程医疗),特别是随着可穿戴医疗监测系统的数量和类型的增加。在当前的COVID-19大流行中,这种系统的重要性得到了加强。除了需要安全收集医疗数据之外,还需要实时处理数据。在这篇文章中,我们设计了一个改进的对称同态密码系统和一个基于雾的通信架构,以支持延迟或时间敏感的监测和其他相关的应用。具体来说,医疗数据可以在雾服务器以安全的方式进行分析。这将有助于决策,例如,允许相关利益相关者检测和应对紧急情况,基于实时数据分析。我们提出了两个攻击游戏来证明我们的方法是安全的(即,计算Diffie-Hellman假设下的选择明文攻击弹性),并评估其计算的复杂性。对其性能和其他三种相关方法的比较总结表明,我们的方法可以实现隐私保证的医疗数据聚合,使用MicrosoftAzure的仿真实验进一步证明了我们方案的实用性。
    Wearable body area network is a key component of the modern-day e-healthcare system (e.g., telemedicine), particularly as the number and types of wearable medical monitoring systems increase. The importance of such systems is reinforced in the current COVID-19 pandemic. In addition to the need for a secure collection of medical data, there is also a need to process data in real-time. In this article, we design an improved symmetric homomorphic cryptosystem and a fog-based communication architecture to support delay- or time-sensitive monitoring and other-related applications. Specifically, medical data can be analyzed at the fog servers in a secure manner. This will facilitate decision making, for example, allowing relevant stakeholders to detect and respond to emergency situations, based on real-time data analysis. We present two attack games to demonstrate that our approach is secure (i.e., chosen-plaintext attack resilience under the computational Diffie-Hellman assumption), and evaluate the complexity of its computations. A comparative summary of its performance and three other related approaches suggests that our approach enables privacy-assured medical data aggregation, and the simulation experiments using Microsoft Azure further demonstrate the utility of our scheme.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    给出了两个概率多图模型的拟合优度测试。第一个模型是给定固定度(RSM)的随机存根匹配,以便对顶点对站点的边分配是相关的,第二种是根据公共概率分布的独立边缘分配(IEA)。使用观察到的多重图的边多重性序列之间的拟合优度度量来执行测试,以及根据简单或复合假设的预期假设。使用皮尔逊型和似然比型的检验统计,并推导了不同模型下皮尔逊统计量的期望值。基于模拟的测试性能表明,即使对于少量的边缘,两种统计量的零分布都由它们的渐近χ2分布很好地近似。测试统计量的非零分布可以通过用于功率近似的所提出的调整后的χ2分布来很好地近似。对于少量的边缘,RSM对两个测试统计量的影响都很大,并且与IEA下的零分布相比,它们的分布向较小的值偏移。包括社交网络上的两个应用程序,以说明测试如何指导社会结构的分析。
    Goodness of fit tests for two probabilistic multigraph models are presented. The first model is random stub matching given fixed degrees (RSM) so that edge assignments to vertex pair sites are dependent, and the second is independent edge assignments (IEA) according to a common probability distribution. Tests are performed using goodness of fit measures between the edge multiplicity sequence of an observed multigraph, and the expected one according to a simple or composite hypothesis. Test statistics of Pearson type and of likelihood ratio type are used, and the expected values of the Pearson statistic under the different models are derived. Test performances based on simulations indicate that even for small number of edges, the null distributions of both statistics are well approximated by their asymptotic χ2-distribution. The non-null distributions of the test statistics can be well approximated by proposed adjusted χ2-distributions used for power approximations. The influence of RSM on both test statistics is substantial for small number of edges and implies a shift of their distributions towards smaller values compared to what holds true for the null distributions under IEA. Two applications on social networks are included to illustrate how the tests can guide in the analysis of social structure.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号