Exploratory data analysis

探索性数据分析
  • 文章类型: Journal Article
    在催化研究中利用多变量数据分析具有非凡的重要性。MIRA21(MiskolcRAnking21)模型的目的是用来自15个不同变量的无偏差可量化数据来表征非均相催化剂,以标准化催化剂表征并提供一个简单的比较工具,等级,并对催化剂进行分类。本工作通过识别影响催化剂比较的基本原理来介绍和数学验证MIRA21模型,并为催化剂设计提供支持。使用MIRA21的描述符系统分析了用于甲苯二胺合成的2,4-二硝基甲苯加氢催化剂的文献数据。在这项研究中,探索性数据分析(EDA)已用于了解单个变量之间的关系,如催化剂性能,反应条件,催化剂组合物,和可持续的参数。结果将适用于催化剂设计,使用机器学习工具也是可能的。
    Utilization of multivariate data analysis in catalysis research has extraordinary importance. The aim of the MIRA21 (MIskolc RAnking 21) model is to characterize heterogeneous catalysts with bias-free quantifiable data from 15 different variables to standardize catalyst characterization and provide an easy tool to compare, rank, and classify catalysts. The present work introduces and mathematically validates the MIRA21 model by identifying fundamentals affecting catalyst comparison and provides support for catalyst design. Literature data of 2,4-dinitrotoluene hydrogenation catalysts for toluene diamine synthesis were analyzed by using the descriptor system of MIRA21. In this study, exploratory data analysis (EDA) has been used to understand the relationships between individual variables such as catalyst performance, reaction conditions, catalyst compositions, and sustainable parameters. The results will be applicable in catalyst design, and using machine learning tools will also be possible.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:电子纸(E-paper)屏幕使用电泳墨水来提供具有高级联网能力的纸状低功率显示器,所述高级联网能力可潜在地用作医院环境中的传统白板和电视显示屏的替代物。急诊室(ED)可以利用电子纸来促进沟通。在电子纸屏幕上提供ED患者状态更新可以提高患者满意度和整体体验,并提供更公平的健康信息访问。
    目的:我们的目的是使用电子纸制作面向患者的数字白板,向ED患者实时显示相关的方向和临床信息。我们还试图评估患者干预后的满意度,并了解患者对数字白板对住院影响的总体看法。
    方法:我们在市区的4个房间中部署了41英寸电子纸数字白板,三级护理,和学术ED,并招募了110名患者,以了解和评估他们的经验。参与者完成了修改后的医院消费者对医疗保健提供者和系统满意度评估问卷,以了解他们的ED停留时间。我们比较了与对照组相匹配的患者的反应,这些患者被分诊到没有数字白板的ED室。我们根据各个部门利益相关者的迭代反馈设计了数字白板。在建立IT基础架构以支持项目之后,我们在方便的基础上将患者纳入对照组和干预组(数字白板).对参与者进行了基线调查,以评估他们对技术的舒适度,并进行了退出调查,以评估他们对数字白板和整体ED满意度的看法。进行统计分析以比较基线特征以及满意度。
    结果:在成功原型设计和实施4个数字白板之后,我们筛选了471例纳入患者.我们招募了110名患者,每组50例患者(对照和干预)完成了研究方案。年龄,性别,种族和族裔组成在群体之间相似。当患者被问及有关延误的沟通(P=.03)和出院后该怎么做(P=.02)时,我们发现访视后调查的满意度显着提高。我们发现,干预组的患者更有可能向家人和朋友推荐该设施(P=0.04)。此外,96%(48/50)表示他们更喜欢带有数字白板的房间,70%(35/50)发现干预“相当多”或“非常”有助于理解他们的ED停留。
    结论:数字白板是在ED中显示面向患者的数据的可行且可接受的方法。我们的飞行员建议电子纸屏幕加上相关的,实时临床数据和包装在一起的数字白板可能会对患者满意度和ED访视期间对设施的感知产生积极影响。需要进一步研究以充分了解对患者满意度和体验的影响。
    背景:ClinicalTrials.govNCT04497922;https://clinicaltrials.gov/ct2/show/NCT04497922。
    BACKGROUND: Electronic paper (E-paper) screens use electrophoretic ink to provide paper-like low-power displays with advanced networking capabilities that may potentially serve as an alternative to traditional whiteboards and television display screens in hospital settings. E-paper may be leveraged in the emergency department (ED) to facilitate communication. Providing ED patient status updates on E-paper screens could improve patient satisfaction and overall experience and provide more equitable access to their health information.
    OBJECTIVE: We aimed to pilot a patient-facing digital whiteboard using E-paper to display relevant orienting and clinical information in real time to ED patients. We also sought to assess patients\' satisfaction after our intervention and understand our patients\' overall perception of the impact of the digital whiteboards on their stay.
    METHODS: We deployed a 41-inch E-paper digital whiteboard in 4 rooms in an urban, tertiary care, and academic ED and enrolled 110 patients to understand and evaluate their experience. Participants completed a modified Hospital Consumer Assessment of Health Care Provider and Systems satisfaction questionnaire about their ED stay. We compared responses to a matched control group of patients triaged to ED rooms without digital whiteboards. We designed the digital whiteboard based on iterative feedback from various departmental stakeholders. After establishing IT infrastructure to support the project, we enrolled patients on a convenience basis into a control and an intervention (digital whiteboard) group. Enrollees were given a baseline survey to evaluate their comfort with technology and an exit survey to evaluate their opinions of the digital whiteboard and overall ED satisfaction. Statistical analysis was performed to compare baseline characteristics as well as satisfaction.
    RESULTS: After the successful prototyping and implementation of 4 digital whiteboards, we screened 471 patients for inclusion. We enrolled 110 patients, and 50 patients in each group (control and intervention) completed the study protocol. Age, gender, and racial and ethnic composition were similar between groups. We saw significant increases in satisfaction on postvisit surveys when patients were asked about communication regarding delays (P=.03) and what to do after discharge (P=.02). We found that patients in the intervention group were more likely to recommend the facility to family and friends (P=.04). Additionally, 96% (48/50) stated that they preferred a room with a digital whiteboard, and 70% (35/50) found the intervention \"quite a bit\" or \"extremely\" helpful in understanding their ED stay.
    CONCLUSIONS: Digital whiteboards are a feasible and acceptable method of displaying patient-facing data in the ED. Our pilot suggested that E-paper screens coupled with relevant, real-time clinical data and packaged together as a digital whiteboard may positively impact patient satisfaction and the perception of the facility during ED visits. Further study is needed to fully understand the impact on patient satisfaction and experience.
    BACKGROUND: ClinicalTrials.gov NCT04497922; https://clinicaltrials.gov/ct2/show/NCT04497922.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    因为大学是社会的变革推动者,所有国家的机构通过探索人类面临的各种社会挑战来设定改变世界的目标。一起,世界各地的高等教育系统根据联合国可持续发展目标(SDG)制定战略。当前的研究旨在为决策者提供,学者,和研究人员对16个可持续发展目标的影响的洞察,为大学制定明确的目标,到2030年实现可持续发展目标铺平了道路。为了分析可持续发展目标之间的相互作用,从Scopus数据库中检索了五年来来自印度的201,844份关于16个可持续发展目标的研究出版物。应用Spearman等级相关来理解每个SDG彼此之间的相关性。我们可以从可持续发展目标之间的相互作用中观察到收敛的结果。确定了SDG对之间的显著正相关和中等正相关。虽然也有大量的负相关性被分类,这需要研究人员深入思考以发展健康的关系。可持续发展目标之间最频繁的互动是任何大学为可持续发展目标制定战略的积极信号。所有大学利益相关者的协会以及一些宪法和文化变革对于将可持续发展目标置于大学管理的核心是必要的。研究人员接受这项任务将提高大学的整体绩效。本研究中提出的分析对学术界很有用,政府,资助机构,研究人员,和政策制定者。
    As universities are the change agent of society, institutions from all nations set their goals to transform the world by exploring various societal challenges that humans are facing. Together, the higher education systems across the world developing strategies based on the United Nations\' Sustainable Development Goals (SDGs). The current study aimed to provide policymakers, academics, and researchers an insight on the influence of 16 SDGs on each other paving the way for the universities to set a clear goal in attaining Sustainable Development goals by 2030. To analyze the SDGs\' interactions towards each other, 201,844 research publications from India during five years on 16 SDGs are retrieved from the Scopus database. Spearman Rank Correlation is applied to understand the correlation of each SDG towards one another. We could observe converging results out of the interactions among the SDGs. A significant positive and moderately positive correlation between pairs of SDGs are identified. While a significant number of negative correlations is also classified which need deep thinking among researchers to develop healthy relationships. The most frequent interactions between SDGs is a positive sign for any university in strategizing the goal towards SDGs. The association of all university stakeholders and some constitutional and cultural changes are necessary to put SDGs at the core of the management of the university. Embracing this task by researchers will improve the overall performance of universities. The analysis presented in the present study is useful for academics, governments, funding agencies, researchers, and policy-makers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    没有空气,人类的生存是无法想象的。现代人类社会几乎所有领域的持续发展都对空气的健康产生了不利影响。日常工业,运输,家庭活动正在我们的环境中搅动有害污染物。在这个时代,监测和预测空气质量已经变得至关重要,尤其是像印度这样的发展中国家。与传统方法相比,基于机器学习技术的预测技术被证明是研究此类现代危害的最有效工具。本工作调查了来自23个印度城市的六年空气污染数据,以进行空气质量分析和预测。对数据集进行了很好的预处理,并通过相关性分析选择了关键特征。进行探索性数据分析,以深入了解数据集中的各种隐藏模式,并确定直接影响空气质量指数的污染物。在大流行年,几乎所有污染物都出现了显着下降,2020年。通过重采样技术解决了数据不平衡问题,并采用了五种机器学习模型来预测空气质量。将这些模型的结果与标准度量进行比较。高斯朴素贝叶斯模型具有最高的精度,而支持向量机模型具有最低的精度。通过建立的性能参数对这些模型的性能进行评估和比较。XGBoost模型在其他模型中表现最好,并且在预测数据和实际数据之间获得最高的线性度。
    The survival of mankind cannot be imagined without air. Consistent developments in almost all realms of modern human society affected the health of the air adversely. Daily industrial, transport, and domestic activities are stirring hazardous pollutants in our environment. Monitoring and predicting air quality have become essentially important in this era, especially in developing countries like India. In contrast to the traditional methods, the prediction technologies based on machine learning techniques are proved to be the most efficient tools to study such modern hazards. The present work investigates six years of air pollution data from 23 Indian cities for air quality analysis and prediction. The dataset is well preprocessed and key features are selected through the correlation analysis. An exploratory data analysis is exercised to develop insights into various hidden patterns in the dataset and pollutants directly affecting the air quality index are identified. A significant fall in almost all pollutants is observed in the pandemic year, 2020. The data imbalance problem is solved with a resampling technique and five machine learning models are employed to predict air quality. The results of these models are compared with the standard metrics. The Gaussian Naive Bayes model achieves the highest accuracy while the Support Vector Machine model exhibits the lowest accuracy. The performances of these models are evaluated and compared through established performance parameters. The XGBoost model performed the best among the other models and gets the highest linearity between the predicted and actual data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    BACKGROUND: Schizophrenia (SCZ) presents complex challenges related to diagnosis and clinical monitoring. The study of conditions associated with SCZ can be facilitated by using potential markers and patterns that provide information to support the diagnosis and oral health.
    METHODS: The salivary composition of patients diagnosed with SCZ (n = 50) was evaluated and compared to the control (n = 50). Saliva samples from male patients were collected and clinical parameters were evaluated. The concentration of total proteins and amylase were determined and salivary macro- and microelements were quantified by ICP OES and ICP-MS. Exploratory data analysis based on artificial intelligence tools was used in the investigation.
    RESULTS: There was a significant increase in the salivary concentrations of Al, Fe, Li, Mg, Na, and V, higher prevalence of caries (p < 0.001), periodontal disease (p < 0.001), and reduced salivary flow rate (p = 0.019) in SCZ patients. Also, samples were grouped into six clusters. As, Co, Cr, Cu, Mn, Mo, Ni, Se, and Sr were correlated with each other, while Fe, K, Li, Ti, and V showed the highest concentrations in the samples distributed in the clusters with the highest association between SZC patients and controls.
    CONCLUSIONS: The results obtained indicate changes in salivary flow, organic composition, and levels of macro- and microelements in SCZ patients. Salivary concentrations of Fe, Mg, and Na may be related to oral conditions, higher prevalence of caries, and periodontal disease. The exploratory analysis showed different patterns in the salivary composition of SCZ patients impacted by associations between oral health conditions and the use of medications. Future studies are encouraged to confirm the results investigated in this study.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    Sensor technologies allow ethologists to continuously monitor the behaviors of large numbers of animals over extended periods of time. This creates new opportunities to study livestock behavior in commercial settings, but also new methodological challenges. Densely sampled behavioral data from large heterogeneous groups can contain a range of complex patterns and stochastic structures that may be difficult to visualize using conventional exploratory data analysis techniques. The goal of this research was to assess the efficacy of unsupervised machine learning tools in recovering complex behavioral patterns from such datasets to better inform subsequent statistical modeling. This methodological case study was carried out using records on milking order, or the sequence in which cows arrange themselves as they enter the milking parlor. Data was collected over a 6-month period from a closed group of 200 mixed-parity Holstein cattle on an organic dairy. Cows at the front and rear of the queue proved more consistent in their entry position than animals at the center of the queue, a systematic pattern of heterogeneity more clearly visualized using entropy estimates, a scale and distribution-free alternative to variance robust to outliers. Dimension reduction techniques were then used to visualize relationships between cows. No evidence of social cohesion was recovered, but Diffusion Map embeddings proved more adept than PCA at revealing the underlying linear geometry of this data. Median parlor entry positions from the pre- and post-pasture subperiods were highly correlated (R = 0.91), suggesting a surprising degree of temporal stationarity. Data Mechanics visualizations, however, revealed heterogeneous non-stationary among subgroups of animals in the center of the group and herd-level temporal outliers. A repeated measures model recovered inconsistent evidence of a relationships between entry position and cow attributes. Mutual conditional entropy tests, a permutation-based approach to assessing bivariate correlations robust to non-independence, confirmed a significant but non-linear association with peak milk yield, but revealed the age effect to be potentially confounded by health status. Finally, queueing records were related back to behaviors recorded via ear tag accelerometers using linear models and mutual conditional entropy tests. Both approaches recovered consistent evidence of differences in home pen behaviors across subsections of the queue.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    The European Food Safety Authority (EFSA) guidance (EFSA, 2017) for dermal absorption (DA) studies recommends stringent mass balance (MB) limits of 95-105%. EFSA suggested that test material can be lost after penetration and requires that for chemicals with <5% absorption the non-recovered material must be added to the absorbed dose if MB is <95%. This has huge consequences for low absorption pesticides. Indeed, one third of the MBs in the EFSA DA database are outside the refined criteria. This is also true for DA data generated by Cosmetics Europe (Gregoire et al., 2019), indicating that this criterion is often not achieved even when using highly standardized protocols. While EFSA hypothesizes that modern analytical and pipetting techniques would enable to achieve this criterion, no scientific basis was provided. We describe how protocol procedures impact MB and evaluate the EFSA DA database to demonstrate that MB is subject to random variation. Generic application of \"the addition rule\" skews the measured data and increases the DA estimate, which results in unnecessary risk assessment failure. In conclusion, \"missing material\" is just a random negative deviation to the nominal dose. We propose a data-driven MB criterion of 90-110%, fully in line with OECD recommendations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    Estimates from individual studies included in a meta-analysis often are not in agreement, giving rise to statistical heterogeneity. In such cases, exploration of the causes of heterogeneity can advance knowledge by formulating novel hypotheses. We present a new method for visualizing between-study heterogeneity using combinatorial meta-analysis. The method is based on performing separate meta-analyses on all possible subsets of studies in a meta-analysis. We use the summary effect sizes and other statistics produced by the all-subsets meta-analyses to generate graphs that can be used to investigate heterogeneity, identify influential studies, and explore subgroup effects. This graphical approach complements alternative graphical explorations of data. We apply the method to numerous biomedical examples, to allow readers to develop intuition on the interpretation of the all-subsets graphical display. The proposed graphical approach may be useful for exploratory data analysis in systematic reviews. Copyright © 2012 John Wiley & Sons, Ltd.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号