Data aggregation

  • 文章类型: Journal Article
    Understanding the health outcomes of military exposures is of critical importance for Veterans, their health care team, and national leaders. Approximately 43% of Veterans report military exposure concerns to their VA providers. Understanding the causal influences of environmental exposures on health is a complex exposure science task and often requires interpreting multiple data sources; particularly when exposure pathways and multi-exposure interactions are ill-defined, as is the case for complex and emerging military service exposures. Thus, there is a need to standardize clinically meaningful exposure metrics from different data sources to guide clinicians and researchers with a consistent model for investigating and communicating exposure risk profiles. The Linked Exposures Across Databases (LEAD) framework provides a unifying model for characterizing exposures from different exposure databases with a focus on providing clinically relevant exposure metrics. Application of LEAD is demonstrated through comparison of different military exposure data sources: Veteran Military Occupational and Environmental Exposure Assessment Tool (VMOAT), Individual Longitudinal Exposure Record (ILER) database, and a military incident report database, the Explosive Ordnance Disposal Information Management System (EODIMS). This cohesive method for evaluating military exposures leverages established information with new sources of data and has the potential to influence how military exposure data is integrated into exposure health care and investigational models.






  • 文章类型: Journal Article
    Due to the uniqueness of the underwater environment, traditional data aggregation schemes face many challenges. Most existing data aggregation solutions do not fully consider node trustworthiness, which may result in the inclusion of falsified data sent by malicious nodes during the aggregation process, thereby affecting the accuracy of the aggregated results. Additionally, because of the dynamically changing nature of the underwater environment, current solutions often lack sufficient flexibility to handle situations such as node movement and network topology changes, significantly impacting the stability and reliability of data transmission. To address the aforementioned issues, this paper proposes a secure data aggregation algorithm based on a trust mechanism. By dynamically adjusting the number and size of node slices based on node trust values and transmission distances, the proposed algorithm effectively reduces network communication overhead and improves the accuracy of data aggregation. Due to the variability in the number of node slices, even if attackers intercept some slices, it is difficult for them to reconstruct the complete data, thereby ensuring data security.






  • 文章类型: Journal Article
    METHODS: The potential for exposure to indoor radon varies dramatically across British Columbia (BC) due to varied geology. Individuals may struggle to understand their exposure risk and agencies may struggle to understand the value of population-level programs and policies to mitigate risk.
    METHODS: The BC Centre for Disease Control (BCCDC) established the BC Radon Data Repository (BCRDR) to facilitate radon research, public awareness, and action in the province. The BCRDR aggregates indoor radon measurements collected by government agencies, industry professionals and organizations, and research and advocacy groups. Participation was formalized with a data sharing agreement, which outlines how the BCCDC anonymizes and manages the shared data integrated into the BCRDR.
    RESULTS: The BCRDR currently holds 38,733 measurements from 18 data contributors. The repository continues to grow with new measurements from existing contributors and the addition of new contributors. A prominent use of the BCRDR was to create the online, interactive BC Radon Map, which includes regional concentration summaries, risk interpretation messaging, and health promotion information. Anonymized BCRDR data are also available for external release upon request.
    CONCLUSIONS: The BCCDC leverages existing radon measurement programs to create a large and integrated database with wide geographic coverage. The development and application of the BCRDR informs public health research and action beyond the BCCDC, and the repository can serve as a model for other regional or national initiatives.
    RéSUMé: LIEU: Le potentiel d’exposition au radon à l’intérieur des bâtiments varie beaucoup d’une région à l’autre de la Colombie-Britannique en raison de la géologie variée. Les particuliers peuvent avoir du mal à comprendre leur risque d’exposition, et les organismes, à comprendre l’utilité des programmes et des politiques populationnels pour atténuer le risque. INTERVENTION: Le BC Centre for Disease Control (« le Centre ») a créé un organe d’archivage, le BC Radon Data Repository (BCRDR), pour faciliter la recherche, l’information, la sensibilisation du public et l’action liées au radon dans la province. Le BCRDR totalise les relevés du radon à l’intérieur des bâtiments pris par les organismes gouvernementaux, les professionnels et les organismes de l’industrie, ainsi que les groupes de recherche et de revendication. La participation est officialisée par un accord de partage de données qui décrit comment le Centre anonymise et gère les données communes du BCRDR. RéSULTATS: Le BCRDR contient actuellement 38 733 relevés de 18 contributeurs de données. Il continue de croître, avec de nouveaux relevés venant de contributeurs existants et l’ajout de nouveaux contributeurs. Il a servi, entre autres, à créer une carte du radon interactive en ligne pour la Colombie-Britannique, avec des résumés des concentrations régionales, des messages d’interprétation du risque et des informations de promotion de la santé. Sur demande, les données anonymisées du BCRDR sont également disponibles pour diffusion externe. CONSéQUENCES: Le Centre a exploité les programmes de prise de relevés du radon existants pour créer une grande base de données intégrée ayant une vaste couverture géographique. Le développement et les applications du BCRDR éclairent la recherche et l’action en santé publique au-delà du Centre, et l’organe d’archivage peut servir de modèle pour d’autres initiatives régionales ou nationales.






  • 文章类型: Journal Article
    Data aggregation plays a critical role in sensor networks for efficient data collection. However, the assumption of uniform initial energy levels among sensors in existing algorithms is unrealistic in practical production applications. This discrepancy in initial energy levels significantly impacts data aggregation in sensor networks. To address this issue, we propose Data Aggregation with Different Initial Energy (DADIE), a novel algorithm that aims to enhance energy-saving, privacy-preserving efficiency, and reduce node death rates in sensor networks with varying initial energy nodes. DADIE considers the transmission distance between nodes and their initial energy levels when forming the network topology, while also limiting the number of child nodes. Furthermore, DADIE reconstructs the aggregation tree before each round of data transmission. This allows nodes closer to the receiving end with higher initial energy to undertake more data aggregation and transmission tasks while limiting energy consumption. As a result, DADIE effectively reduces the node death rate and improves the efficiency of data transmission throughout the network. To enhance network security, DADIE establishes secure transmission channels between transmission nodes prior to data transmission, and it employs slice-and-mix technology within the network. Our experimental simulations demonstrate that the proposed DADIE algorithm effectively resolves the data aggregation challenges in sensor networks with varying initial energy nodes. It achieves 5-20% lower communication overhead and energy consumption, 10-20% higher security, and 10-30% lower node mortality than existing algorithms.






  • 文章类型: Journal Article
    The Internet of Things (IoT) is a network of intelligent devices especially in healthcare-based systems. Internet of Medical Things (IoMT) uses wearable sensors to collect data and transmit to central repositories. The security and privacy of healthcare data is a challenging task. The aim of the study is to provide a secure data sharing mechanism. The existing studies provide secure data sharing schemes but still have limitations in terms of hiding the patient identify in the messages exchanged to upload the data on central repositories. This paper presents a Secure Aggregated Data Collection and Transmission (SADCT) that provides anonymity for the identities of patient\'s mobile device and the intermediate fog nodes. Our system involves an authenticated server for node registration and authentication by saving security credentials. The proposed scheme presents the novel data aggregation algorithm at the mobile device, and the data extraction algorithm at the fog node. The work is validated through extensive simulations in NS along with a security analysis. Results prove the supremacy of SADCT in terms of energy consumption, storage, communication, and computational costs.






  • 文章类型: Journal Article
    Advancements in phenotyping technology have enabled plant science researchers to gather large volumes of information from their experiments, especially those that evaluate multiple genotypes. To fully leverage these complex and often heterogeneous data sets (i.e. those that differ in format and structure), scientists must invest considerable time in data processing, and data management has emerged as a considerable barrier for downstream application. Here, we propose a pipeline to enhance data collection, processing, and management from plant science studies comprising of two newly developed open-source programs. The first, called AgTC, is a series of programming functions that generates comma-separated values file templates to collect data in a standard format using either a lab-based computer or a mobile device. The second series of functions, AgETL, executes steps for an Extract-Transform-Load (ETL) data integration process where data are extracted from heterogeneously formatted files, transformed to meet standard criteria, and loaded into a database. There, data are stored and can be accessed for data analysis-related processes, including dynamic data visualization through web-based tools. Both AgTC and AgETL are flexible for application across plant science experiments without programming knowledge on the part of the domain scientist, and their functions are executed on Jupyter Notebook, a browser-based interactive development environment. Additionally, all parameters are easily customized from central configuration files written in the human-readable YAML format. Using three experiments from research laboratories in university and non-government organization (NGO) settings as test cases, we demonstrate the utility of AgTC and AgETL to streamline critical steps from data collection to analysis in the plant sciences.






  • 文章类型: Preprint
    UNASSIGNED: Gene co-expression networks (GCNs) describe relationships among expressed genes key to maintaining cellular identity and homeostasis. However, the small sample size of typical RNA-seq experiments which is several orders of magnitude fewer than the number of genes is too low to infer GCNs reliably. recount3, a publicly available dataset comprised of 316,443 uniformly processed human RNA-seq samples, provides an opportunity to improve power for accurate network reconstruction and obtain biological insight from the resulting networks.
    UNASSIGNED: We compared alternate aggregation strategies to identify an optimal workflow for GCN inference by data aggregation and inferred three consensus networks: a universal network, a non-cancer network, and a cancer network in addition to 27 tissue context-specific networks. Central network genes from our consensus networks were enriched for evolutionarily constrained genes and ubiquitous biological pathways, whereas central context-specific network genes included tissue-specific transcription factors and factorization based on the hubs led to clustering of related tissue contexts. We discovered that annotations corresponding to context-specific networks inferred from aggregated data were enriched for trait heritability beyond known functional genomic annotations and were significantly more enriched when we aggregated over a larger number of samples.
    UNASSIGNED: This study outlines best practices for network GCN inference and evaluation by data aggregation. We recommend estimating and regressing confounders in each data set before aggregation and prioritizing large sample size studies for GCN reconstruction. Increased statistical power in inferring context-specific networks enabled the derivation of variant annotations that were enriched for concordant trait heritability independent of functional genomic annotations that are context-agnostic. While we observed strictly increasing held-out log-likelihood with data aggregation, we noted diminishing marginal improvements. Future directions aimed at alternate methods for estimating confounders and integrating orthogonal information from modalities such as Hi-C and ChIP-seq can further improve GCN inference.






  • 文章类型: Journal Article
    Threatened species monitoring can produce enormous quantities of acoustic and visual recordings which must be searched for animal detections. Data coding is extremely time-consuming for humans and even though machine algorithms are emerging as useful tools to tackle this task, they too require large amounts of known detections for training. Citizen scientists are often recruited via crowd-sourcing to assist. However, the results of their coding can be difficult to interpret because citizen scientists lack comprehensive training and typically each codes only a small fraction of the full dataset. Competence may vary between citizen scientists, but without knowing the ground truth of the dataset, it is difficult to identify which citizen scientists are most competent. We used a quantitative cognitive model, cultural consensus theory, to analyze both empirical and simulated data from a crowdsourced analysis of audio recordings of Australian frogs. Several hundred citizen scientists were asked whether the calls of nine frog species were present on 1260 brief audio recordings, though most only coded a fraction of these recordings. Through modeling, characteristics of both the citizen scientist cohort and the recordings were estimated. We then compared the model\'s output to expert coding of the recordings and found agreement between the cohort\'s consensus and the expert evaluation. This finding adds to the evidence that crowdsourced analyses can be utilized to understand large-scale datasets, even when the ground truth of the dataset is unknown. The model-based analysis provides a promising tool to screen large datasets prior to investing expert time and resources.






  • 文章类型: Journal Article
    Wearable body area network is a key component of the modern-day e-healthcare system (e.g., telemedicine), particularly as the number and types of wearable medical monitoring systems increase. The importance of such systems is reinforced in the current COVID-19 pandemic. In addition to the need for a secure collection of medical data, there is also a need to process data in real-time. In this article, we design an improved symmetric homomorphic cryptosystem and a fog-based communication architecture to support delay- or time-sensitive monitoring and other-related applications. Specifically, medical data can be analyzed at the fog servers in a secure manner. This will facilitate decision making, for example, allowing relevant stakeholders to detect and respond to emergency situations, based on real-time data analysis. We present two attack games to demonstrate that our approach is secure (i.e., chosen-plaintext attack resilience under the computational Diffie-Hellman assumption), and evaluate the complexity of its computations. A comparative summary of its performance and three other related approaches suggests that our approach enables privacy-assured medical data aggregation, and the simulation experiments using Microsoft Azure further demonstrate the utility of our scheme.






  • 文章类型: Journal Article
    Goodness of fit tests for two probabilistic multigraph models are presented. The first model is random stub matching given fixed degrees (RSM) so that edge assignments to vertex pair sites are dependent, and the second is independent edge assignments (IEA) according to a common probability distribution. Tests are performed using goodness of fit measures between the edge multiplicity sequence of an observed multigraph, and the expected one according to a simple or composite hypothesis. Test statistics of Pearson type and of likelihood ratio type are used, and the expected values of the Pearson statistic under the different models are derived. Test performances based on simulations indicate that even for small number of edges, the null distributions of both statistics are well approximated by their asymptotic χ2-distribution. The non-null distributions of the test statistics can be well approximated by proposed adjusted χ2-distributions used for power approximations. The influence of RSM on both test statistics is substantial for small number of edges and implies a shift of their distributions towards smaller values compared to what holds true for the null distributions under IEA. Two applications on social networks are included to illustrate how the tests can guide in the analysis of social structure.





