data processing

  • 文章类型: Journal Article
    To date, poly- and perfluoroalkyl substances (PFAS) represent a real threat for their environmental persistence, wide physicochemical variability, and their potential toxicity. Thus far a large portion of these chemicals remain structurally unknown. These chemicals, therefore, require the implementation of complex non-targeted analysis workflows using liquid chromatography coupled with high-resolution mass spectrometry (LC-HRMS) for their comprehensive detection and monitoring. This approach, even though comprehensive, does not always provide the much-needed analytical resolution for the analysis of complex PFAS mixtures such as fire-fighting aqueous film-forming foams (AFFFs). This study consolidates the advantages of the LC×LC technique hyphenated with high-resolution tandem mass spectrometry (HRMS/MS) for the identification of PFAS in AFFF mixtures. A total of 57 PFAS homolog series (HS) were identified in 3M and Orchidee AFFF mixtures thanks to the (i) high chromatographic peak capacity (n\'2D,c ~ 300) and the (i) increased mass domain resolution provided by the \"remainder of Kendrick Mass\" (RKM) analysis on the HRMS data. Then, we attempted to annotate the PFAS of each HS by exploiting the available reference standards and the FluoroMatch workflow in combination with the RKM defect by different fluorine repeating units, such as CF2, CF2O, and C2F4O. This approach resulted in 12 identified PFAS HS, including compounds belonging to the HS of perfluoroalkyl carboxylic acids (PFACAs), perfluoroalkyl sulfonic acids (PFASAs), (N-pentafluoro(5)sulfide)-perfluoroalkane sulfonates (SF5-PFASAs), N-sulfopropyldimethylammoniopropyl perfluoroalkane sulfonamides (N-SPAmP-FASA), and N-carboxymethyldimethylammoniopropyl perfluoroalkane sulfonamide (N-CMAmP-FASA). The annotated categories of perfluoroalkyl aldehydes and chlorinated PFASAs represent the first record of PFAS HS in the investigated AFFF samples.






  • 文章类型: Journal Article
    The primary objective of the research presented in this article is to introduce an artificial neural network that demands less computational power than a conventional deep neural network. The development of this ANN was achieved through the application of Ordered Fuzzy Numbers (OFNs). In the context of Industry 4.0, there are numerous applications where this solution could be utilized for data processing. It allows the deployment of Artificial Intelligence at the network edge on small devices, eliminating the need to transfer large amounts of data to a cloud server for analysis. Such networks will be easier to implement in small-scale solutions, like those for the Internet of Things, in the future. This paper presents test results where a real system was monitored, and anomalies were detected and predicted.






  • 文章类型: Journal Article
    As digital phenotyping, the capture of active and passive data from consumer devices such as smartphones, becomes more common, the need to properly process the data and derive replicable features from it has become paramount. Cortex is an open-source data processing pipeline for digital phenotyping data, optimized for use with the mindLAMP apps, which is used by nearly 100 research teams across the world. Cortex is designed to help teams (1) assess digital phenotyping data quality in real time, (2) derive replicable clinical features from the data, and (3) enable easy-to-share data visualizations. Cortex offers many options to work with digital phenotyping data, although some common approaches are likely of value to all teams using it. This paper highlights the reasoning, code, and example steps necessary to fully work with digital phenotyping data in a streamlined manner. Covering how to work with the data, assess its quality, derive features, and visualize findings, this paper is designed to offer the reader the knowledge and skills to apply toward analyzing any digital phenotyping data set. More specifically, the paper will teach the reader the ins and outs of the Cortex Python package. This includes background information on its interaction with the mindLAMP platform, some basic commands to learn what data can be pulled and how, and more advanced use of the package mixed with basic Python with the goal of creating a correlation matrix. After the tutorial, different use cases of Cortex are discussed, along with limitations. Toward highlighting clinical applications, this paper also provides 3 easy ways to implement examples of Cortex use in real-world settings. By understanding how to work with digital phenotyping data and providing ready-to-deploy code with Cortex, the paper aims to show how the new field of digital phenotyping can be both accessible to all and rigorous in methodology.






  • 文章类型: Journal Article
    The long-term loss of distribution network in the process of distribution network development is caused by the backward management mode of distribution network. The traditional analysis and calculation methods of distribution network loss can not adapt to the current development environment of distribution network. To improve the accuracy of filling missing values in power load data, particle swarm optimization algorithm is proposed to optimize the clustering center of the clustering algorithm. Furthermore, the original isolated forest anomaly recognition algorithm can be used to detect outliers in the load data, and the coefficient of variation of the load data is used to improve the recognition accuracy of the algorithm. Finally, this paper introduces a breadth-first-based method for calculating line loss in the context of big data. An example is provided using the distribution network system of Yuxi City in Yunnan Province, and a simulation experiment is carried out. And the findings revealed that the error of the enhanced fuzzy C-mean clustering algorithm was on average - 6.35, with a standard deviation of 4.015 in the situation of partially missing data. The area under the characteristic curve of the improved isolated forest algorithm subjects in the case of the abnormal sample fuzzy situation was 0.8586, with the smallest decrease, based on the coefficient of variation, and through the refinement of the analysis, it was discovered that the feeder line loss rate is 7.62%. It is confirmed that the suggested technique can carry out distribution network line loss analysis fast and accurately and can serve as a guide for managing distribution network line loss.






  • 文章类型: Journal Article
    With the growing concerns about the protection of ecosystem functions and services, governments have developed public policies and organizations have produced an awesome volume of digital data freely available through their websites. On the other hand, advances in data acquisition through remote sensed sources and processing through geographic information systems (GIS) and statistical tools, allowed an unprecedent capacity to manage ecosystems efficiently. However, the real-world scenario in that regard remains paradoxically challenging. The reasons can be many and diverse, but a strong candidate relates with the limited engagement among the interest parties that hampers bringing all these assets into action. The aim of the study is to demonstrate that management of ecosystem services can be significantly improved by integrating existing environmental policies with environmental big data and low-cost GIS and data processing tools. Using the Upper Rio das Velhas hydrographic basin located in the state of Minas Gerais (Brazil) as example, the study demonstrated how Principal Components Analysis based on a diversity of environmental variables assembled sub-basins into urban, agriculture, mining and heterogeneous profiles, directing management of ecosystem services to the most appropriate officially established conservation plans. The use of GIS tools, on the other hand, allowed narrowing the implementation of each plan to specific sub-basins. This optimized allocation of preferential management plans to priority areas was discussed for a number of conservation plans. A paradigmatic example was the so-called Conservation Use Potential (CUP) devoted to the protection of aquifer recharge (provision service) and control of water erosion (regulation service), as well as to the allocation of uses as function of soil capability (support service). In all cases, the efficiency gains in readiness for plans\' implementation and economy of resources were prognosed as noteworthy.






  • 文章类型: Journal Article
    State-of-the-art mass spectrometers combined with modern bioinformatics algorithms for peptide-to-spectrum matching (PSM) with robust statistical scoring allow for more variable features (i.e., post-translational modifications) being reliably identified from (tandem-) mass spectrometry data, often without the need for biochemical enrichment. Semi-specific proteome searches, that enforce a theoretical enzymatic digestion to solely the N- or C-terminal end, allow to identify of native protein termini or those arising from endogenous proteolytic activity (also referred to as \"neo-N-termini\" analysis or \"N-terminomics\"). Nevertheless, deriving biological meaning from these search outputs can be challenging in terms of data mining and analysis. Thus, we introduce TermineR, a data analysis approach for the (1) annotation of peptides according to their enzymatic cleavage specificity and known protein processing features, (2) differential abundance and enrichment analysis of N-terminal sequence patterns, and (3) visualization of neo-N-termini location. We illustrate the use of TermineR by applying it to tandem mass tag (TMT)-based proteomics data of a mouse model of polycystic kidney disease, and assess the semi-specific searches for biological interpretation of cleavage events and the variable contribution of proteolytic products to general protein abundance. The TermineR approach and example data are available as an R package at






  • 文章类型: Journal Article
    Exposomics aims to measure human exposures throughout the lifespan and the changes they produce in the human body. Exposome-scale studies have significant potential to understand the interplay of environmental factors with complex multifactorial diseases widespread in our society and whose origin remain unclear. In this framework, the study of the chemical exposome aims to cover all chemical exposures and their effects in human health but, today, this goal still seems unfeasible or at least very challenging, which makes the exposome for now only a concept. Furthermore, the study of the chemical exposome faces several methodological challenges such as moving from specific targeted methodologies towards high-throughput multitargeted and non-targeted approaches, guaranteeing the availability and quality of biological samples to obtain quality analytical data, standardization of applied analytical methodologies, as well as the statistical assignment of increasingly complex datasets, or the identification of (un)known analytes. This review discusses the various steps involved in applying the exposome concept from an analytical perspective. It provides an overview of the wide variety of existing analytical methods and instruments, highlighting their complementarity to develop combined analytical strategies to advance towards the chemical exposome characterization. In addition, this review focuses on endocrine disrupting chemicals (EDCs) to show how studying even a minor part of the chemical exposome represents a great challenge. Analytical strategies applied in an exposomics context have shown great potential to elucidate the role of EDCs in health outcomes. However, translating innovative methods into etiological research and chemical risk assessment will require a multidisciplinary effort. Unlike other review articles focused on exposomics, this review offers a holistic view from the perspective of analytical chemistry and discuss the entire analytical workflow to finally obtain valuable results.






  • 文章类型: Journal Article
    Common challenges in cryogenic electron microscopy, such as orientation bias, conformational diversity, and 3D misclassification, complicate single particle analysis and lead to significant resource expenditure. We previously introduced an in silico method using the maximum Feret diameter distribution, the Feret signature, to characterize sample heterogeneity of disc-shaped samples. Here, we expanded the Feret signature methodology to identify preferred orientations of samples containing arbitrary shapes with only about 1000 particles required. This method enables real-time adjustments of data acquisition parameters for optimizing data collection strategies or aiding in decisions to discontinue ineffective imaging sessions. Beyond detecting preferred orientations, the Feret signature approach can serve as an early-warning system for inconsistencies in classification during initial image processing steps, a capability that allows for strategic adjustments in data processing. These features establish the Feret signature as a valuable auxiliary tool in the context of single particle analysis, significantly accelerating the structure determination process.






  • 文章类型: Journal Article
    The intelligent predictive and optimized wastewater treatment plant method represents a ground-breaking shift in how we manage wastewater. By capitalizing on data-driven predictive modeling, automation, and optimization strategies, it introduces a comprehensive framework designed to enhance the efficiency and sustainability of wastewater treatment operations. This methodology encompasses various essential phases, including data gathering and training, the integration of innovative computational models such as Chimp-based GoogLeNet (CbG), data processing, and performance prediction, all while fine-tuning operational parameters. The designed model is a hybrid of the Chimp optimization algorithm and GoogLeNet. The GoogLeNet is a type of deep convolutional architecture, and the Chimp optimization is one of the bio-inspired optimization models based on chimpanzee behavior. It optimizes the operational parameters, such as pH, dosage rate, effluent quality, and energy consumption, of the wastewater treatment plant, by fixing the optimal settings in the GoogLeNet. The designed model includes the process such as pre-processing and feature analysis for the effective prediction of the operation parameters and its optimization. Notably, this innovative approach provides several key advantages, including cost reduction in operations, improved environmental outcomes, and more effective resource management. Through continuous adaptation and refinement, this methodology not only optimizes wastewater treatment plant performance but also effectively tackles evolving environmental challenges while conserving resources. It represents a significant step forward in the quest for efficient and sustainable wastewater treatment practices. The RMSE, MAE, MAPE, and R2 scores for the suggested technique are 1.103, 0.233, 0.012, and 0.002. Also, the model has shown that power usage decreased to about 1.4%, while greenhouse gas emissions have significantly decreased to 0.12% than the existing techniques.






  • 文章类型: Journal Article
    Mass spectrometry is broadly employed to study complex molecular mechanisms in various biological and environmental fields, enabling \'omics\' research such as proteomics, metabolomics, and lipidomics. As study cohorts grow larger and more complex with dozens to hundreds of samples, the need for robust quality control (QC) measures through automated software tools becomes paramount to ensure the integrity, high quality, and validity of scientific conclusions from downstream analyses and minimize the waste of resources. Since existing QC tools are mostly dedicated to proteomics, automated solutions supporting metabolomics are needed. To address this need, we developed the software PeakQC, a tool for automated QC of MS data that is independent of omics molecular types (i.e., omics-agnostic). It allows automated extraction and inspection of peak metrics of precursor ions (e.g., errors in mass, retention time, arrival time) and supports various instrumentations and acquisition types, from infusion experiments or using liquid chromatography and/or ion mobility spectrometry front-end separations and with/without fragmentation spectra from data-dependent or independent acquisition analyses. Diagnostic plots for fragmentation spectra are also generated. Here, we describe and illustrate PeakQC\'s functionalities using different representative data sets, demonstrating its utility as a valuable tool for enhancing the quality and reliability of omics mass spectrometry analyses.





