Principal component analysis (PCA)

主成分分析 (PCA)
  • 文章类型: Journal Article
    Metabolomics is the study of low molecular weight biochemical molecules (typically <1500 Da) in a defined biological organism or system. In case of food systems, the term \"food metabolomics\" is often used. Food metabolomics has been widely explored and applied in various fields including food analysis, food intake, food traceability, and food safety. Food safety applications focusing on the identification of pathogen-specific biomarkers have been promising. This chapter describes a nontargeted metabolite profiling workflow using gas chromatography coupled with mass spectrometry (GC-MS) for characterizing three globally important foodborne pathogens, Escherichia coli O157:H7, Listeria monocytogenes, and Salmonella enterica, from selective enrichment liquid culture media. The workflow involves a detailed description of food spiking experiments followed by procedures for the extraction of polar metabolites from media, the analysis of the extracts using GC-MS, and finally chemometric data analysis using univariate and multivariate statistical tools to identify potential pathogen-specific biomarkers.






  • 文章类型: Journal Article
    New applications such as augmented reality/virtual reality (AR/VR), Internet-of-Things (IOT), autonomous mobile robot (AMR) services, etc., require high reliability and high accuracy real-time positioning and tracking of persons and devices in indoor areas. Among the different visible-light-positioning (VLP) schemes, such as proximity, time-of-arrival (TOA), time-difference-of-arrival (TDOA), angle-of-arrival (AOA), and received-signal-strength (RSS), the RSS scheme is relatively easy to implement. Among these VLP methods, the RSS method is simple and efficient. As the received optical power has an inverse relationship with the distance between the LED transmitter (Tx) and the photodiode (PD) receiver (Rx), position information can be estimated by studying the received optical power from different Txs. In this work, we propose and experimentally demonstrate a real-time VLP system utilizing long short-term memory neural network (LSTM-NN) with principal component analysis (PCA) to mitigate high positioning error, particularly at the positioning unit cell boundaries. Experimental results show that in a positioning unit cell of 100 × 100 × 250 cm3, the average positioning error is 5.912 cm when using LSTM-NN only. By utilizing the PCA, we can observe that the positioning accuracy can be significantly enhanced to 1.806 cm, particularly at the unit cell boundaries and cell corners, showing a positioning error reduction of 69.45%. In the cumulative distribution function (CDF) measurements, when using only the LSTM-NN model, the positioning error of 95% of the experimental data is >15 cm; while using the LSTM-NN with PCA model, the error is reduced to <5 cm. In addition, we also experimentally demonstrate that the proposed real-time VLP system can also be used to predict the direction and the trajectory of the moving Rx.






  • 文章类型: Journal Article
    In this study, a neural network was developed for the detection of acetone, ethanol, chloroform, and air pollutant NO2 gases using an Interdigitated Electrode (IDE) sensor-based e-nose system. A bioimpedance spectroscopy (BIS)-based interface circuit was used to measure sensor responses in the e-nose system. The sensor was fed with a sinusoidal voltage at 10 MHz frequency and 0.707 V amplitude. Sensor responses were sampled at 100 Hz frequency and converted to digital data with 16-bit resolution. The highest change in impedance magnitude obtained in the e-nose system against chloroform gas was recorded as 24.86 Ω over a concentration range of 0-11,720 ppm. The highest gas detection sensitivity of the e-nose system was calculated as 0.7825 Ω/ppm against 6.7 ppm NO2 gas. Before training with the neural network, data were filtered from noise using Kalman filtering. Principal Component Analysis (PCA) was applied to the improved signal data for dimensionality reduction, separating them from noise and outliers with low variance and non-informative characteristics. The neural network model created is multi-layered and employs the backpropagation algorithm. The Xavier initialization method was used for determining the initial weights of neurons. The neural network successfully classified NO2 (6.7 ppm), acetone (1820 ppm), ethanol (1820 ppm), and chloroform (1465 ppm) gases with a test accuracy of 87.16%. The neural network achieved this test accuracy in a training time of 239.54 milliseconds. As sensor sensitivity increases, the detection capability of the neural network also improves.






  • 文章类型: Journal Article
    In developing countries, smart grids are nonexistent, and electricity theft significantly hampers power supply. This research introduces a lightweight deep-learning model using monthly customer readings as input data. By employing careful direct and indirect feature engineering techniques, including Principal Component Analysis (PCA), t-distributed Stochastic Neighbor Embedding (t-SNE), UMAP (Uniform Manifold Approximation and Projection), and resampling methods such as Random-Under-Sampler (RUS), Synthetic Minority Over-sampling Technique (SMOTE), and Random-Over-Sampler (ROS), an effective solution is proposed. Previous studies indicate that models achieve high precision, recall, and F1 score for the non-theft (0) class, but perform poorly, even achieving 0 %, for the theft (1) class. Through parameter tuning and employing Random-Over-Sampler (ROS), significant improvements in accuracy, precision (89 %), recall (94 %), and F1 score (91 %) for the theft (1) class are achieved. The results demonstrate that the proposed model outperforms existing methods, showcasing its efficacy in detecting electricity theft in non-smart grid environments.






  • 文章类型: Journal Article
    As a fruit and vegetable crop, the ornamental pepper is not just highly ornamental but also rich in nutritional value. The quality of ornamental pepper fruits is given in their contents of capsaicin, vitamin C (VC), flavonoids and total phenols. The study concentrated on the accumulation of capsaicin and dihydrocapsaicin in different tissues of 18 peppers during fruit growth and development. The results showed that the pericarp and placenta contained significantly higher levels of capsaicin than dihydrocapsaicin. Additionally, the placenta contained significantly higher levels of both capsaicin and dihydrocapsaicin compared to the pericarp. The content of capsaicin was in the range of 0-6.7915 mg·g-1, the range of dihydrocapsaicin content was 0-5.329 mg·g-1. Interestingly, we found that the pericarp is rich in VC (5.4506 mg·g-1) and the placenta is high in flavonoids (4.8203 mg·g-1) and total phenols (119.63 mg·g-1). The capsaicin is the most important component using the correlation analysis and principal component analysis. The qPCR results substantiated that the expression of genes in the placenta was significantly higher than that in the pericarp and that the expression of genes in green ripening stage was higher than that in red ripening stage. This study could be utilized to select the best ripening stages and tissues to harvest peppers according to the use of the pepper and to the needs of producers. It not only provides a reference for quality improvement and processing for consumers and market but also provides a theoretical basis for high-quality pepper breeding.






  • 文章类型: Journal Article
    Small molecules as ligands target multifunctional ribonucleic acids (RNA) for therapeutic engagement. This study explores how the anticancer DNA intercalator harmine interacts various motifs of RNAs, including the single-stranded A-form poly (rA), the clover leaf tRNAphe, and the double-stranded A-form poly (rC)-poly (rG). Harmine showed the affinity to the polynucleotides in the order, poly (rA) > tRNAphe > poly (rC)·poly (rG). While no induced circular dichroism change was detected with poly (rC)poly (rG), significant structural alterations of poly (rA) followed by tRNAphe and occurrence of concurrent initiation of optical activity in the attached achiral molecule of alkaloid was reported. At 25 °C, the affinity further showed exothermic and entropy-driven binding. The interaction also highlighted heat capacity (ΔC o p ) and Gibbs energy contribution from the hydrophobic transfer (ΔG hyd) of binding with harmine. Molecular docking calculations indicated that harmine exhibits higher affinity for poly (rA) compared to tRNAphe and poly (rC)·poly (rG). Subsequent molecular dynamics simulations were conducted to investigate the binding mode and stability of harmine with poly(A), tRNAphe, and poly (rC)·poly (rG). The results revealed that harmine adopts a partial intercalative binding with poly (rA) and tRNAphe, characterized by pronounced stacking forces and stronger binding free energy observed with poly (rA), while a comparatively weaker binding free energy was observed with tRNAphe. In contrast, the stacking forces with poly (rC)·poly (rG) were comparatively less pronounced and adopts a groove binding mode. It was also supported by ferrocyanide quenching analysis. All these findings univocally provide detailed insight into the binding specificity of harmine, to single stranded poly (rA) over other RNA motifs, probably suggesting a self-structure formation in poly (rA) with harmine and its potential as a lead compound for RNA based drug targeting.






  • 文章类型: Journal Article
    Increasingly, information technology facilitates the storage and management of data useful for risk analysis and event prediction. Studies on data extraction related to occupational health and safety are increasingly available; however, due to its variability, the construction sector warrants special attention. This review is conducted under the research programs of the National Institute for Occupational Accident Insurance (Inail).
    OBJECTIVE: The research question focuses on identifying which data mining (DM) methods, among supervised, unsupervised, and others, are most appropriate for certain investigation objectives, types, and sources of data, as defined by the authors.
    METHODS: Scopus and ProQuest were the main sources from which we extracted studies in the field of construction, published between 2014 and 2023. The eligibility criteria applied in the selection of studies were based on the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA). For exploratory purposes, we applied hierarchical clustering, while for in-depth analysis, we used principal component analysis (PCA) and meta-analysis.
    RESULTS: The search strategy based on the PRISMA eligibility criteria provided us with 63 out of 2234 potential articles, 206 observations, 89 methodologies, 4 survey purposes, 3 data sources, 7 data types, and 3 resource types. Cluster analysis and PCA organized the information included in the paper dataset into two dimensions and labels: \"supervised methods, institutional dataset, and predictive and classificatory purposes\" (correlation 0.97-8.18 × 10-1; p-value 7.67 × 10-55-1.28 × 10-22) and the second, Dim2 \"not-supervised methods; project, simulation, literature, text data; monitoring, decision-making processes; machinery and environment\" (corr. 0.84-0.47; p-value 5.79 × 10-25--3.59 × 10-6). We answered the research question regarding which method, among supervised, unsupervised, or other, is most suitable for application to data in the construction industry.
    CONCLUSIONS: The meta-analysis provided an overall estimate of the better effectiveness of supervised methods (Odds Ratio = 0.71, Confidence Interval 0.53-0.96) compared to not-supervised methods.






  • 文章类型: Journal Article
    Blood is commonly discovered at crime scenes in various forms, including stains, dried residue, pools, and fingerprints on assorted surfaces. Estimating the age of bloodstains is a crucial aspect of reconstructing crime scenes. This research aimed to investigate how the nature of different surfaces affects the estimation of bloodstain age, utilizing a reliable and non-destructive approach. The study employed ATR-FTIR spectroscopy in conjunction with Chemometric techniques such as PCA (Principal Component Analysis) and OPLSR (Orthogonal Signal Correction Partial Least Square Regression Analysis) to analyze spectral data and develop regression models for estimating bloodstain age on cement, metal, and wooden surfaces for up to eleven days. The chemometric models for bloodstains on all three substrates demonstrated strong performance, with predictive Root Mean Square Error (RMSE) values ranging from 1.1 to 1.43 and R2 values from 0.84 to 0.89. Notably, the model developed for metal surfaces was found to be the most accurate with minimal prediction error. The findings of the study showed that the porosity of the substrates upon which bloodstains were found had a discernible influence on the age-related transformations observed in bloodstains; the majority of which occured within the spectral range of 2800 cm- 1 to 3500 cm- 1.






  • 文章类型: Journal Article
    Rising global temperatures can lead to heat waves, which in turn can pose health risks to the community. However, a notable gap remains in highlighting the primary contributing factors that amplify heat-health risk among vulnerable populations. This study aims to evaluate the precedence of heat stress contributing factors in urban and rural vulnerable populations living in hot and humid tropical regions. A comparative cross-sectional study was conducted, involving 108 respondents from urban and rural areas in Klang Valley, Malaysia, using a face-to-face interview and a validated questionnaire. Data was analyzed using the principal component analysis, categorizing factors into exposure, sensitivity, and adaptive capacity indicators. In urban areas, five principal components (PCs) explained 64.3% of variability, with primary factors being sensitivity (health morbidity, medicine intake, increased age), adaptive capacity (outdoor occupation type, lack of ceiling, longer residency duration), and exposure (lower ceiling height, increased building age). In rural, five PCs explained 71.5% of variability, with primary factors being exposure (lack of ceiling, high thermal conductivity roof material, increased building age, shorter residency duration), sensitivity (health morbidity, medicine intake, increased age), and adaptive capacity (female, non-smoking, higher BMI). The order of heat-health vulnerability indicators was sensitivity > adaptive capacity > exposure for urban areas, and exposure > sensitivity > adaptive capacity for rural areas. This study demonstrated a different pattern of leading contributors to heat stress between urban and rural vulnerable populations.






  • 文章类型: Journal Article
    The potential for rotor component shedding in rotating machinery poses significant risks, necessitating the development of an early and precise fault diagnosis technique to prevent catastrophic failures and reduce maintenance costs. This study introduces a data-driven approach to detect rotor component shedding at its inception, thereby enhancing operational safety and minimizing downtime. Utilizing frequency analysis, this research identifies harmonic amplitudes within rotor vibration data as key indicators of impending faults. The methodology employs principal component analysis (PCA) to orthogonalize and reduce the dimensionality of vibration data from rotor sensors, followed by k-fold cross-validation to select a subset of significant features, ensuring the detection algorithm\'s robustness and generalizability. These features are then integrated into a linear discriminant analysis (LDA) model, which serves as the diagnostic engine to predict the probability of rotor component shedding. The efficacy of the approach is demonstrated through its application to 16 industrial compressors and turbines, proving its value in providing timely fault warnings and enhancing operational reliability.





