data processing

数据处理
  • 文章类型: Journal Article
    随着人们对保护生态系统功能和服务的日益关注,政府制定了公共政策,组织通过其网站免费提供了大量数字数据。另一方面,通过遥感源获取数据以及通过地理信息系统(GIS)和统计工具进行处理的进展,允许前所未有的能力来有效地管理生态系统。然而,这方面的现实世界仍然自相矛盾。原因可能是多种多样的,但是一个强有力的候选人与利益方之间的有限参与有关,这阻碍了所有这些资产的行动。该研究的目的是证明通过将现有的环境政策与环境大数据以及低成本的GIS和数据处理工具相结合,可以显着改善生态系统服务的管理。以位于米纳斯吉拉斯州(巴西)的上RiodasVelhas水文盆地为例,这项研究展示了基于环境变量多样性的主成分分析如何将子流域组装成城市,农业,采矿和异质概况,将生态系统服务的管理指导到最合适的官方制定的保护计划。GIS工具的使用,另一方面,允许将每个计划的实施范围缩小到特定的子盆地。针对许多保护计划,讨论了将优惠管理计划优化分配到优先区域的方法。一个典型的例子是所谓的保护使用潜力(CUP),专门用于保护含水层补给(提供服务)和控制水蚀(调节服务),以及根据土壤能力分配用途(支持服务)。在所有情况下,计划实施准备效率的提高和资源的节约被认为是值得注意的。
    With the growing concerns about the protection of ecosystem functions and services, governments have developed public policies and organizations have produced an awesome volume of digital data freely available through their websites. On the other hand, advances in data acquisition through remote sensed sources and processing through geographic information systems (GIS) and statistical tools, allowed an unprecedent capacity to manage ecosystems efficiently. However, the real-world scenario in that regard remains paradoxically challenging. The reasons can be many and diverse, but a strong candidate relates with the limited engagement among the interest parties that hampers bringing all these assets into action. The aim of the study is to demonstrate that management of ecosystem services can be significantly improved by integrating existing environmental policies with environmental big data and low-cost GIS and data processing tools. Using the Upper Rio das Velhas hydrographic basin located in the state of Minas Gerais (Brazil) as example, the study demonstrated how Principal Components Analysis based on a diversity of environmental variables assembled sub-basins into urban, agriculture, mining and heterogeneous profiles, directing management of ecosystem services to the most appropriate officially established conservation plans. The use of GIS tools, on the other hand, allowed narrowing the implementation of each plan to specific sub-basins. This optimized allocation of preferential management plans to priority areas was discussed for a number of conservation plans. A paradigmatic example was the so-called Conservation Use Potential (CUP) devoted to the protection of aquifer recharge (provision service) and control of water erosion (regulation service), as well as to the allocation of uses as function of soil capability (support service). In all cases, the efficiency gains in readiness for plans\' implementation and economy of resources were prognosed as noteworthy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    最先进的质谱仪与现代生物信息学算法相结合,用于具有强大统计评分的肽-光谱匹配(PSM),可实现更多可变特征(即,翻译后修饰)从(tandem-)质谱数据中可靠地识别,通常不需要生化浓缩。半特异性蛋白质组搜索,仅在N端或C端进行理论上的酶消化,允许鉴定天然蛋白质末端或由内源性蛋白水解活性产生的那些(也称为“neo-N-termini”分析或“N-terminalomics”)。然而,从这些搜索输出中获得生物学意义在数据挖掘和分析方面可能是具有挑战性的。因此,我们介绍TermineR,一种数据分析方法,用于(1)根据其酶切特异性和已知的蛋白质加工特征对肽进行注释,(2)N端序列模式的丰度差异和富集分析,和(3)新N-终端位置的可视化。我们通过将其应用于多囊肾病小鼠模型的基于串联质量标签(TMT)的蛋白质组学数据来说明TermineR的使用。并评估半特异性搜索对切割事件的生物学解释以及蛋白水解产物对一般蛋白质丰度的可变贡献。TermineR方法和示例数据可在https://github.com/MiguelCos/TermineR上作为R包获得。
    State-of-the-art mass spectrometers combined with modern bioinformatics algorithms for peptide-to-spectrum matching (PSM) with robust statistical scoring allow for more variable features (i.e., post-translational modifications) being reliably identified from (tandem-) mass spectrometry data, often without the need for biochemical enrichment. Semi-specific proteome searches, that enforce a theoretical enzymatic digestion to solely the N- or C-terminal end, allow to identify of native protein termini or those arising from endogenous proteolytic activity (also referred to as \"neo-N-termini\" analysis or \"N-terminomics\"). Nevertheless, deriving biological meaning from these search outputs can be challenging in terms of data mining and analysis. Thus, we introduce TermineR, a data analysis approach for the (1) annotation of peptides according to their enzymatic cleavage specificity and known protein processing features, (2) differential abundance and enrichment analysis of N-terminal sequence patterns, and (3) visualization of neo-N-termini location. We illustrate the use of TermineR by applying it to tandem mass tag (TMT)-based proteomics data of a mouse model of polycystic kidney disease, and assess the semi-specific searches for biological interpretation of cleavage events and the variable contribution of proteolytic products to general protein abundance. The TermineR approach and example data are available as an R package at https://github.com/MiguelCos/TermineR.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Exposomics旨在测量人类在整个生命周期中的暴露情况以及它们在人体中产生的变化。Exposome规模的研究具有重要的潜力,可以了解环境因素与复杂的多因素疾病之间的相互作用,这些疾病在我们的社会中普遍存在,其起源尚不清楚。在这个框架中,对化学暴露的研究旨在涵盖所有化学暴露及其对人类健康的影响,但是,今天,这个目标似乎仍然不可行,或者至少非常具有挑战性,这使得目前的曝光只是一个概念。此外,化学暴露的研究面临着几个方法学挑战,例如从特定的目标方法转向高通量的多目标和非目标方法,保证生物样品的可用性和质量,以获得高质量的分析数据,应用分析方法的标准化,以及日益复杂的数据集的统计分配,或(非)已知分析物的鉴定。这篇综述从分析的角度讨论了应用曝光概念所涉及的各个步骤。它概述了现有的各种分析方法和仪器,强调它们的互补性,以开发组合分析策略,以推进化学暴露组表征。此外,这篇综述的重点是内分泌干扰化学物质(EDCs),以表明研究即使是一小部分的化学物质暴露是一个巨大的挑战。在暴露组学背景下应用的分析策略已显示出阐明EDC在健康结果中的作用的巨大潜力。然而,将创新方法转化为病因学研究和化学风险评估将需要多学科的努力。与其他专注于曝光组学的评论文章不同,这篇综述从分析化学的角度提供了一个整体的观点,并讨论了整个分析工作流程,以最终获得有价值的结果。
    Exposomics aims to measure human exposures throughout the lifespan and the changes they produce in the human body. Exposome-scale studies have significant potential to understand the interplay of environmental factors with complex multifactorial diseases widespread in our society and whose origin remain unclear. In this framework, the study of the chemical exposome aims to cover all chemical exposures and their effects in human health but, today, this goal still seems unfeasible or at least very challenging, which makes the exposome for now only a concept. Furthermore, the study of the chemical exposome faces several methodological challenges such as moving from specific targeted methodologies towards high-throughput multitargeted and non-targeted approaches, guaranteeing the availability and quality of biological samples to obtain quality analytical data, standardization of applied analytical methodologies, as well as the statistical assignment of increasingly complex datasets, or the identification of (un)known analytes. This review discusses the various steps involved in applying the exposome concept from an analytical perspective. It provides an overview of the wide variety of existing analytical methods and instruments, highlighting their complementarity to develop combined analytical strategies to advance towards the chemical exposome characterization. In addition, this review focuses on endocrine disrupting chemicals (EDCs) to show how studying even a minor part of the chemical exposome represents a great challenge. Analytical strategies applied in an exposomics context have shown great potential to elucidate the role of EDCs in health outcomes. However, translating innovative methods into etiological research and chemical risk assessment will require a multidisciplinary effort. Unlike other review articles focused on exposomics, this review offers a holistic view from the perspective of analytical chemistry and discuss the entire analytical workflow to finally obtain valuable results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    低温电子显微镜的共同挑战,例如取向偏差,构象多样性,和3D错误分类,复杂的单粒子分析,并导致大量的资源支出。我们之前介绍了一种使用最大费雷特直径分布的计算机模拟方法,费雷特的签名,表征圆盘形样品的样品异质性。这里,我们扩展了Feret签名方法,以确定包含任意形状且仅需要约1000个颗粒的样品的首选方向。该方法使得能够实时调整数据采集参数,以用于优化数据收集策略或帮助决定中断无效成像会话。除了检测首选方向,Feret签名方法可以作为初始图像处理步骤中分类不一致的早期预警系统,一种允许在数据处理中进行战略调整的能力。这些特征将Feret签名确立为在单粒子分析的背景下的有价值的辅助工具。显著加快了结构确定过程。
    Common challenges in cryogenic electron microscopy, such as orientation bias, conformational diversity, and 3D misclassification, complicate single particle analysis and lead to significant resource expenditure. We previously introduced an in silico method using the maximum Feret diameter distribution, the Feret signature, to characterize sample heterogeneity of disc-shaped samples. Here, we expanded the Feret signature methodology to identify preferred orientations of samples containing arbitrary shapes with only about 1000 particles required. This method enables real-time adjustments of data acquisition parameters for optimizing data collection strategies or aiding in decisions to discontinue ineffective imaging sessions. Beyond detecting preferred orientations, the Feret signature approach can serve as an early-warning system for inconsistencies in classification during initial image processing steps, a capability that allows for strategic adjustments in data processing. These features establish the Feret signature as a valuable auxiliary tool in the context of single particle analysis, significantly accelerating the structure determination process.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    智能预测和优化的废水处理厂方法代表了我们管理废水的突破性转变。通过利用数据驱动的预测建模,自动化,和优化策略,它引入了一个全面的框架,旨在提高废水处理操作的效率和可持续性。这种方法包括各种基本阶段,包括数据收集和培训,集成创新的计算模型,如基于黑猩猩的GoogLeNet(CbG),数据处理,和性能预测,同时微调操作参数。所设计的模型是Chimp优化算法和GoogLeNet的混合。GoogLeNet是一种深度卷积架构,黑猩猩优化是基于黑猩猩行为的生物启发优化模型之一。它优化了运行参数,如pH值,剂量率,出水水质,和能源消耗,污水处理厂,修复GoogLeNet中的最佳设置。所设计的模型包括预处理和特征分析等过程,以对运行参数进行有效预测及其优化。值得注意的是,这种创新方法提供了几个关键优势,包括降低运营成本,改善环境结果,更有效的资源管理。通过不断的适应和完善,这种方法不仅优化了污水处理厂的性能,而且有效地应对不断变化的环境挑战,同时节约资源。它代表了在寻求有效和可持续的废水处理实践方面向前迈出的重要一步。RMSE,MAE,地图,建议技术的R2评分分别为1.103、0.233、0.012和0.002。此外,该模型显示,用电量下降到约1.4%,而温室气体排放量比现有技术显著下降到0.12%。
    The intelligent predictive and optimized wastewater treatment plant method represents a ground-breaking shift in how we manage wastewater. By capitalizing on data-driven predictive modeling, automation, and optimization strategies, it introduces a comprehensive framework designed to enhance the efficiency and sustainability of wastewater treatment operations. This methodology encompasses various essential phases, including data gathering and training, the integration of innovative computational models such as Chimp-based GoogLeNet (CbG), data processing, and performance prediction, all while fine-tuning operational parameters. The designed model is a hybrid of the Chimp optimization algorithm and GoogLeNet. The GoogLeNet is a type of deep convolutional architecture, and the Chimp optimization is one of the bio-inspired optimization models based on chimpanzee behavior. It optimizes the operational parameters, such as pH, dosage rate, effluent quality, and energy consumption, of the wastewater treatment plant, by fixing the optimal settings in the GoogLeNet. The designed model includes the process such as pre-processing and feature analysis for the effective prediction of the operation parameters and its optimization. Notably, this innovative approach provides several key advantages, including cost reduction in operations, improved environmental outcomes, and more effective resource management. Through continuous adaptation and refinement, this methodology not only optimizes wastewater treatment plant performance but also effectively tackles evolving environmental challenges while conserving resources. It represents a significant step forward in the quest for efficient and sustainable wastewater treatment practices. The RMSE, MAE, MAPE, and R2 scores for the suggested technique are 1.103, 0.233, 0.012, and 0.002. Also, the model has shown that power usage decreased to about 1.4%, while greenhouse gas emissions have significantly decreased to 0.12% than the existing techniques.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    质谱广泛用于研究各种生物和环境领域的复杂分子机制。启用蛋白质组学等“组学”研究,代谢组学,和脂质组学。随着研究队列越来越大,越来越复杂,有数十到数百个样本,需要强大的质量控制(QC)措施,通过自动化的软件工具变得至关重要,以确保完整性,高质量,以及下游分析的科学结论的有效性,最大限度地减少资源浪费。由于现有的QC工具主要致力于蛋白质组学,需要支持代谢组学的自动化解决方案。为了满足这一需求,我们开发了PeakQC软件,一种独立于组学分子类型的MS数据自动QC工具(即,组学-不可知论者)。它允许自动提取和检查前体离子的峰值度量(例如,质量错误,保留时间,到达时间),并支持各种仪器和采集类型,来自输注实验或使用液相色谱和/或离子迁移谱的前端分离,并且具有/不具有来自数据依赖性或独立采集分析的碎片谱。还生成了碎裂光谱的诊断图。这里,我们使用不同的代表性数据集描述和说明PeakQC的功能,证明其作为提高组学质谱分析质量和可靠性的有价值的工具的实用性。
    Mass spectrometry is broadly employed to study complex molecular mechanisms in various biological and environmental fields, enabling \'omics\' research such as proteomics, metabolomics, and lipidomics. As study cohorts grow larger and more complex with dozens to hundreds of samples, the need for robust quality control (QC) measures through automated software tools becomes paramount to ensure the integrity, high quality, and validity of scientific conclusions from downstream analyses and minimize the waste of resources. Since existing QC tools are mostly dedicated to proteomics, automated solutions supporting metabolomics are needed. To address this need, we developed the software PeakQC, a tool for automated QC of MS data that is independent of omics molecular types (i.e., omics-agnostic). It allows automated extraction and inspection of peak metrics of precursor ions (e.g., errors in mass, retention time, arrival time) and supports various instrumentations and acquisition types, from infusion experiments or using liquid chromatography and/or ion mobility spectrometry front-end separations and with/without fragmentation spectra from data-dependent or independent acquisition analyses. Diagnostic plots for fragmentation spectra are also generated. Here, we describe and illustrate PeakQC\'s functionalities using different representative data sets, demonstrating its utility as a valuable tool for enhancing the quality and reliability of omics mass spectrometry analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    本研究旨在衡量在三种不同的数据处理协议(原始,自定义处理,制造商处理)。距离的估计,使用10HzGNSS跟踪技术设备从一支球队的38名澳大利亚精英足球运动员的14场比赛中收集速度和加速度。从各自的专有软件导出原始和制造商处理的数据,并且针对三种处理方法计算两个常见的汇总加速度度量(中/高强度区内的努力次数和距离)。为了估计三种不同的数据处理方法对汇总指标的影响,使用线性混合模型。主要发现表明,三种处理方法之间存在实质性差异;制造商处理的加速度数据具有最低的报告距离(低184倍)和努力(低89倍),其次是定制处理距离(低3.3倍)和努力(低4.3倍),原始数据报告的距离和努力程度最高。结果表明,不同的处理方法改变了度量输出,进而改变了运动需求的量化(体积,度量的强度和频率)。教练,从业者和研究人员需要了解各种处理方法会改变加速度数据的汇总指标。通过了解这些指标如何受到处理方法的影响,他们可以更好地解释可用的数据,并有效地定制他们的培训计划,以满足比赛的需求。
    This study aimed to measure the differences in commonly used summary acceleration metrics during elite Australian football games under three different data processing protocols (raw, custom-processed, manufacturer-processed). Estimates of distance, speed and acceleration were collected with a 10-Hz GNSS tracking technology device from fourteen matches of 38 elite Australian football players from one team. Raw and manufacturer-processed data were exported from respective proprietary software and two common summary acceleration metrics (number of efforts and distance within medium/high-intensity zone) were calculated for the three processing methods. To estimate the effect of the three different data processing methods on the summary metrics, linear mixed models were used. The main findings demonstrated that there were substantial differences between the three processing methods; the manufacturer-processed acceleration data had the lowest reported distance (up to 184 times lower) and efforts (up to 89 times lower), followed by the custom-processed distance (up to 3.3 times lower) and efforts (up to 4.3 times lower), where raw data had the highest reported distance and efforts. The results indicated that different processing methods changed the metric output and in turn alters the quantification of the demands of a sport (volume, intensity and frequency of the metrics). Coaches, practitioners and researchers need to understand that various processing methods alter the summary metrics of acceleration data. By being informed about how these metrics are affected by processing methods, they can better interpret the data available and effectively tailor their training programs to match the demands of competition.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    论文“使用胰岛素和碳水化合物的吸收模型和深度倾斜以提高血糖水平预测”(Sensors2021,21,5273)提出了一种新的方法来预测1型糖尿病(T1DM)患者的血糖水平。通过从原始碳水化合物和胰岛素数据建立指数模型来模拟体内的吸收,作者报道,在预测未来一小时的血糖水平时,模型的均方根误差(RMSE)从15.5mg/dL(原始)降低至9.2mg/dL(指数).在这篇评论中,我们证明了那篇论文中使用的实验技术是有缺陷的,使其结果和结论无效。具体来说,在审查了作者的代码之后,我们发现模型验证方案是错误的,即,来自相同时间间隔的训练和测试数据是混合的.这意味着参考论文中报告的RMSE数字没有准确衡量所提出方法的预测能力。我们通过适当隔离训练和测试数据来修复测量技术,我们发现他们的模型实际上比论文中报道的要糟糕得多。事实上,那篇论文中提出的模型似乎没有比预测未来血糖水平与当前水平相同的幼稚模型表现更好。
    The paper \"Using Absorption Models for Insulin and Carbohydrates and Deep Leaning to Improve Glucose Level Predictions\" (Sensors2021, 21, 5273) proposes a novel approach to predicting blood glucose levels for people with type 1 diabetes mellitus (T1DM). By building exponential models from raw carbohydrate and insulin data to simulate the absorption in the body, the authors reported a reduction in their model\'s root-mean-square error (RMSE) from 15.5 mg/dL (raw) to 9.2 mg/dL (exponential) when predicting blood glucose levels one hour into the future. In this comment, we demonstrate that the experimental techniques used in that paper are flawed, which invalidates its results and conclusions. Specifically, after reviewing the authors\' code, we found that the model validation scheme was malformed, namely, the training and test data from the same time intervals were mixed. This means that the reported RMSE numbers in the referenced paper did not accurately measure the predictive capabilities of the approaches that were presented. We repaired the measurement technique by appropriately isolating the training and test data, and we discovered that their models actually performed dramatically worse than was reported in the paper. In fact, the models presented in the that paper do not appear to perform any better than a naive model that predicts future glucose levels to be the same as the current ones.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    当前的工作提出了由四个扭曲的胶合板条组成的轻质梁元素的综合空间数据集的生成,通过在受控的实验室条件下应用运动结构(SfM)-多视图立体(MVS)摄影测量技术来实现。数据收集过程精心进行,以确保准确性和准确性,采用不同长度的比例尺。然后使用摄影测量软件对捕获的图像进行处理,导致点云的产生,网格,和纹理文件。这些数据文件表示不同网格尺寸(原始,高聚,中聚,和低聚),为3D可视化添加高级别的细节。该数据集具有巨大的重用潜力,并为数值建模的进一步研究提供了必要的资源,复杂结构的模拟,和训练机器学习算法。这些数据还可以作为新兴摄影测量方法和形式发现技术的验证集,尤其是涉及大变形和几何非线性的,特别是在结构工程领域。
    The current work presents the generation of a comprehensive spatial dataset of a lightweight beam element composed of four twisted plywood strips, achieved through the application of Structure-from-Motion (SfM) - Multi-view Stereo (MVS) photogrammetry techniques in controlled laboratory conditions. The data collection process was meticulously conducted to ensure accuracy and precision, employing scale bars of varying lengths. The captured images were then processed using photogrammetric software, leading to the creation of point clouds, meshes, and texture files. These data files represent the 3D model of the beam at different mesh sizes (raw, high-poly, medium-poly, and low-poly), adding a high level of detail to the 3D visualization. The dataset holds significant reuse potential and offers essential resources for further studies in numerical modeling, simulations of complex structures, and training machine learning algorithms. This data can also serve as validation sets for emerging photogrammetry methods and form-finding techniques, especially ones involving large deformations and geometric nonlinearities, particularly within the structural engineering field.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基于质谱(MS)的单细胞蛋白质组学(SCP)通过关注细胞蛋白质的功能效应子来探索细胞异质性。然而,从MS数据中提取有意义的生物信息绝非易事,尤其是单细胞。目前,数据分析工作流程从一个研究团队到另一个研究团队有很大的不同。此外,由于缺乏地面真相,很难评估管道。我们的团队开发了名为scp的R/Bioconductor软件包,为SCP数据分析提供了一个标准化的框架。它依赖于广泛使用的QFeatures和SingleCellExperiment数据结构。此外,我们使用包含以已知比例混合的细胞系的设计,以产生受控的变异性用于数据分析基准.在这一章中,我们使用scp软件包为SCP数据提供了灵活的数据分析协议,并在处理的每个步骤中提供了全面的解释.我们的主要步骤是功能和细胞水平的质量控制,将原始数据汇总为肽和蛋白质,归一化,和批量更正。我们使用我们的地面实况数据集验证我们的工作流程。我们说明如何使用这个模块化,标准化框架,并强调一些关键步骤。
    Mass-spectrometry (MS)-based single-cell proteomics (SCP) explores cellular heterogeneity by focusing on the functional effectors of the cells-proteins. However, extracting meaningful biological information from MS data is far from trivial, especially with single cells. Currently, data analysis workflows are substantially different from one research team to another. Moreover, it is difficult to evaluate pipelines as ground truths are missing. Our team has developed the R/Bioconductor package called scp to provide a standardized framework for SCP data analysis. It relies on the widely used QFeatures and SingleCellExperiment data structures. In addition, we used a design containing cell lines mixed in known proportions to generate controlled variability for data analysis benchmarking. In this chapter, we provide a flexible data analysis protocol for SCP data using the scp package together with comprehensive explanations at each step of the processing. Our main steps are quality control on the feature and cell level, aggregation of the raw data into peptides and proteins, normalization, and batch correction. We validate our workflow using our ground truth data set. We illustrate how to use this modular, standardized framework and highlight some crucial steps.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号