Exploratory data analysis

探索性数据分析
  • 文章类型: Journal Article
    在催化研究中利用多变量数据分析具有非凡的重要性。MIRA21(MiskolcRAnking21)模型的目的是用来自15个不同变量的无偏差可量化数据来表征非均相催化剂,以标准化催化剂表征并提供一个简单的比较工具,等级,并对催化剂进行分类。本工作通过识别影响催化剂比较的基本原理来介绍和数学验证MIRA21模型,并为催化剂设计提供支持。使用MIRA21的描述符系统分析了用于甲苯二胺合成的2,4-二硝基甲苯加氢催化剂的文献数据。在这项研究中,探索性数据分析(EDA)已用于了解单个变量之间的关系,如催化剂性能,反应条件,催化剂组合物,和可持续的参数。结果将适用于催化剂设计,使用机器学习工具也是可能的。
    Utilization of multivariate data analysis in catalysis research has extraordinary importance. The aim of the MIRA21 (MIskolc RAnking 21) model is to characterize heterogeneous catalysts with bias-free quantifiable data from 15 different variables to standardize catalyst characterization and provide an easy tool to compare, rank, and classify catalysts. The present work introduces and mathematically validates the MIRA21 model by identifying fundamentals affecting catalyst comparison and provides support for catalyst design. Literature data of 2,4-dinitrotoluene hydrogenation catalysts for toluene diamine synthesis were analyzed by using the descriptor system of MIRA21. In this study, exploratory data analysis (EDA) has been used to understand the relationships between individual variables such as catalyst performance, reaction conditions, catalyst compositions, and sustainable parameters. The results will be applicable in catalyst design, and using machine learning tools will also be possible.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    没有空气,人类的生存是无法想象的。现代人类社会几乎所有领域的持续发展都对空气的健康产生了不利影响。日常工业,运输,家庭活动正在我们的环境中搅动有害污染物。在这个时代,监测和预测空气质量已经变得至关重要,尤其是像印度这样的发展中国家。与传统方法相比,基于机器学习技术的预测技术被证明是研究此类现代危害的最有效工具。本工作调查了来自23个印度城市的六年空气污染数据,以进行空气质量分析和预测。对数据集进行了很好的预处理,并通过相关性分析选择了关键特征。进行探索性数据分析,以深入了解数据集中的各种隐藏模式,并确定直接影响空气质量指数的污染物。在大流行年,几乎所有污染物都出现了显着下降,2020年。通过重采样技术解决了数据不平衡问题,并采用了五种机器学习模型来预测空气质量。将这些模型的结果与标准度量进行比较。高斯朴素贝叶斯模型具有最高的精度,而支持向量机模型具有最低的精度。通过建立的性能参数对这些模型的性能进行评估和比较。XGBoost模型在其他模型中表现最好,并且在预测数据和实际数据之间获得最高的线性度。
    The survival of mankind cannot be imagined without air. Consistent developments in almost all realms of modern human society affected the health of the air adversely. Daily industrial, transport, and domestic activities are stirring hazardous pollutants in our environment. Monitoring and predicting air quality have become essentially important in this era, especially in developing countries like India. In contrast to the traditional methods, the prediction technologies based on machine learning techniques are proved to be the most efficient tools to study such modern hazards. The present work investigates six years of air pollution data from 23 Indian cities for air quality analysis and prediction. The dataset is well preprocessed and key features are selected through the correlation analysis. An exploratory data analysis is exercised to develop insights into various hidden patterns in the dataset and pollutants directly affecting the air quality index are identified. A significant fall in almost all pollutants is observed in the pandemic year, 2020. The data imbalance problem is solved with a resampling technique and five machine learning models are employed to predict air quality. The results of these models are compared with the standard metrics. The Gaussian Naive Bayes model achieves the highest accuracy while the Support Vector Machine model exhibits the lowest accuracy. The performances of these models are evaluated and compared through established performance parameters. The XGBoost model performed the best among the other models and gets the highest linearity between the predicted and actual data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    BACKGROUND: Schizophrenia (SCZ) presents complex challenges related to diagnosis and clinical monitoring. The study of conditions associated with SCZ can be facilitated by using potential markers and patterns that provide information to support the diagnosis and oral health.
    METHODS: The salivary composition of patients diagnosed with SCZ (n = 50) was evaluated and compared to the control (n = 50). Saliva samples from male patients were collected and clinical parameters were evaluated. The concentration of total proteins and amylase were determined and salivary macro- and microelements were quantified by ICP OES and ICP-MS. Exploratory data analysis based on artificial intelligence tools was used in the investigation.
    RESULTS: There was a significant increase in the salivary concentrations of Al, Fe, Li, Mg, Na, and V, higher prevalence of caries (p < 0.001), periodontal disease (p < 0.001), and reduced salivary flow rate (p = 0.019) in SCZ patients. Also, samples were grouped into six clusters. As, Co, Cr, Cu, Mn, Mo, Ni, Se, and Sr were correlated with each other, while Fe, K, Li, Ti, and V showed the highest concentrations in the samples distributed in the clusters with the highest association between SZC patients and controls.
    CONCLUSIONS: The results obtained indicate changes in salivary flow, organic composition, and levels of macro- and microelements in SCZ patients. Salivary concentrations of Fe, Mg, and Na may be related to oral conditions, higher prevalence of caries, and periodontal disease. The exploratory analysis showed different patterns in the salivary composition of SCZ patients impacted by associations between oral health conditions and the use of medications. Future studies are encouraged to confirm the results investigated in this study.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    Sensor technologies allow ethologists to continuously monitor the behaviors of large numbers of animals over extended periods of time. This creates new opportunities to study livestock behavior in commercial settings, but also new methodological challenges. Densely sampled behavioral data from large heterogeneous groups can contain a range of complex patterns and stochastic structures that may be difficult to visualize using conventional exploratory data analysis techniques. The goal of this research was to assess the efficacy of unsupervised machine learning tools in recovering complex behavioral patterns from such datasets to better inform subsequent statistical modeling. This methodological case study was carried out using records on milking order, or the sequence in which cows arrange themselves as they enter the milking parlor. Data was collected over a 6-month period from a closed group of 200 mixed-parity Holstein cattle on an organic dairy. Cows at the front and rear of the queue proved more consistent in their entry position than animals at the center of the queue, a systematic pattern of heterogeneity more clearly visualized using entropy estimates, a scale and distribution-free alternative to variance robust to outliers. Dimension reduction techniques were then used to visualize relationships between cows. No evidence of social cohesion was recovered, but Diffusion Map embeddings proved more adept than PCA at revealing the underlying linear geometry of this data. Median parlor entry positions from the pre- and post-pasture subperiods were highly correlated (R = 0.91), suggesting a surprising degree of temporal stationarity. Data Mechanics visualizations, however, revealed heterogeneous non-stationary among subgroups of animals in the center of the group and herd-level temporal outliers. A repeated measures model recovered inconsistent evidence of a relationships between entry position and cow attributes. Mutual conditional entropy tests, a permutation-based approach to assessing bivariate correlations robust to non-independence, confirmed a significant but non-linear association with peak milk yield, but revealed the age effect to be potentially confounded by health status. Finally, queueing records were related back to behaviors recorded via ear tag accelerometers using linear models and mutual conditional entropy tests. Both approaches recovered consistent evidence of differences in home pen behaviors across subsections of the queue.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号