outliers

异常值
  • 文章类型: Journal Article
    本文旨在评估哮喘和非哮喘儿童和青少年的空气污染暴露与第一秒用力呼气量(FEV1)之间的统计关联。其中响应变量FEV1每月重复测量,表征纵向实验。由于数据的性质,鲁棒线性混合模型(RLMM),结合稳健的主成分分析(RPCA),建议处理协变量之间的多重共线性以及极端观测值(高水平的空气污染物)对估计值的影响。考虑Huber和Tukey损失函数以获得线性混合模型(LMM)中参数的鲁棒估计。在协变量遵循线性时间序列模型的情况下,进行了有限的样本量调查,其中有和没有加性异常值(AO)。研究了时间相关性和异常值对LMM中固定效应参数估计的影响。在真实的数据分析中,稳健的模型策略证明,RPCA表现出三个主成分(PC),主要与相对湿度(Hmd)有关,直径小于10μm的颗粒物(PM10)和直径小于2.5μm的颗粒物(PM2.5)。
    This paper aims to evaluate the statistical association between exposure to air pollution and forced expiratory volume in the first second (FEV1) in both asthmatic and non-asthmatic children and teenagers, in which the response variable FEV1 was repeatedly measured on a monthly basis, characterizing a longitudinal experiment. Due to the nature of the data, an robust linear mixed model (RLMM), combined with a robust principal component analysis (RPCA), is proposed to handle the multicollinearity among the covariates and the impact of extreme observations (high levels of air contaminants) on the estimates. The Huber and Tukey loss functions are considered to obtain robust estimators of the parameters in the linear mixed model (LMM). A finite sample size investigation is conducted under the scenario where the covariates follow linear time series models with and without additive outliers (AO). The impact of the time-correlation and the outliers on the estimates of the fixed effect parameters in the LMM is investigated. In the real data analysis, the robust model strategy evidenced that RPCA exhibits three principal component (PC), mainly related to relative humidity (Hmd), particulate matter with a diameter smaller than 10 μm (PM10) and particulate matter with a diameter smaller than 2.5 μm (PM2.5).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在机械切割行业,试生产用于在批量生产之前预测和评估产品过程的质量,可以通过合格率来表示。然而,它不能客观、全面地评价产品过程的质量。本研究对数理统计中的异常值和稳定性分析进行了优化,以便更好地应用于机械切削行业;然后,它将它们与过程能力分析相结合。同时,考虑到过程参数的非正态分布,提出了一种批量生产预测模型。通过直径验证了批量生产预测模型的可靠性,结构常见样品的圆度和粗糙度。同时,对于机械切割行业中的其他机械零件,本文提出的模型可用于快速准确地预测和评估批量生产。
    In the mechanical cutting industry, trial production is used for predicting and evaluating the quality of product processes before batch production, and it can be expressed through the qualification rate. However, it cannot objectively and comprehensively evaluate the quality of product processes. This study optimizes the analysis of outliers and stability in mathematical statistics to better apply it in the mechanical cutting industry; then, it combines them with process capability analysis. Simultaneously, considering the non-normal distribution of process parameters, a batch production-prediction model is proposed. The reliability of batch production-prediction model is verified by the diameter, roundness and roughness of structural common samples. Meanwhile, for other mechanical parts in the mechanical cutting industry, the model proposed in this paper can be used to quickly and accurately predict and evaluate batch production.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在最近一项采用时间生产的研究中,一些参与者提供了异常数据,通常会将它们标记为异常值。鉴于文献中关于时间流动的虚幻性质的持续讨论,在本文中,我们考虑他们的数据是否可能表明时间知觉的不连续性。我们分析了这些异常值的对数对数图,调查所有数据点的线性度保持到什么程度,而不是使用双egmental回归实现更好的拟合。目前的结果,虽然是初步的,可以有助于关于主观时间非线性的争论。看起来目标持续时间越长,时间的持续体验可以是时间的主观减慢(较长的时间单位,斜率增加),或主观的时间加速(较短的时间单位,坡度减小)。
    In a recent study employing time production, a number of participants presented aberrant data, which normally would have marked them as being outliers. Given the ongoing discussion in the literature regarding the illusory nature of the flow of time, in this paper we consider whether their data may indicate discontinuity in time perception. We analyze the log-log plots for these outliers, investigating to what degree linearity is preserved for all the data points, as opposed to achieving a better fit using bisegmental regression. The current results, though preliminary, can contribute to the debate regarding the non-linearity of subjective time. It would seem that with longer target durations, the ongoing experience of time can be either one of a subjective slowing down of time (longer time units, increase in slope), or of a subjective speeding up of time (shorter time units, decrease in slope).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    线性回归对于数据建模至关重要,尤其是对科学家来说。然而,有了大量的高维数据,有数据的解释变量比观测数量多。在这种情况下,传统的方法失败了。本文以海藻大数据为用例,提出了一种改进的稀疏回归模型,解决了异质性问题。改进的岭异质性模型,使用LASSO和Elasticnet对数据进行建模。稳健估计M双平方,M汉佩尔,MHuber,使用MM和S。根据结果,之前的稀疏回归混合模型,之后,并且具有45个高排序变量和2-sigma限制的改进的异质性稳健回归可以有效地减少异常值。所获得的结果证实,与其他现有方法相比,针对45个高排序参数的M双方估计器的改进稀疏LASSO的混合模型性能更好。
    The linear regression is critical for data modelling, especially for scientists. Nevertheless, with the plenty of high-dimensional data, there are data with more explanatory variables than the number of observations. In such circumstances, traditional approaches fail. This paper proposes a modified sparse regression model that solves the problem of heterogeneity using seaweed big data as a use case. The modified heterogeneity models for ridge, LASSO and Elastic net were used to model the data. Robust estimations M Bi-Square, M Hampel, M Huber, MM and S were used. Based on the results, the hybrid model of sparse regression for before, after, and modified heterogeneity robust regression with the 45 high ranking variables and a 2-sigma limit can be used efficiently and effectively to reduce the outliers. The obtained results confirm that the hybrid model of the modified sparse LASSO with the M Bi-Square estimator for the 45 high ranking parameters performed better compared with other existing methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    当调查不可观察时,复杂的特征,数据收集和聚合过程可以为数据引入独特的特征,如有界性,测量误差,聚类,异常值,和异方差。未能共同解决这些特征可能会导致统计挑战,从而阻止对有关这些特征的假设的调查。本研究旨在证明贝叶斯β比例广义线性潜在和混合模型(β比例GLLAMM)的有效性(Rabe-Hesketh等人。,Psychometrika,69(2)、167-90,2004a,计量经济学杂志,128(2)、301-23,2004c,2004b;Skrondal和Rabe-Hesketh2004)在探索有关语音清晰度的研究假设时处理数据特征。为了实现这一目标,该研究重新检查了Boonen等人最初收集的自发语音样本的转录数据。(儿童语言杂志,50(1)、78-103,2023年)。将数据汇总为熵得分。研究比较了β-比例GLLAMM与正态线性混合模型(LMM)的预测精度(Holmes等.,2019年),并研究了其从熵分数估计潜在可懂度的能力。该研究还说明了如何使用所提出的模型来探索有关说话者相关因素对可懂度的影响的假设。beta比例GLLAMM并非没有挑战;其实施需要制定有关数据生成过程的假设以及概率编程语言的知识,都是贝叶斯方法的核心。然而,结果表明,该模型在预测经验现象方面优于正常LMM,以及它量化潜在可理解性的能力。此外,所提出的模型有助于探索有关说话者相关因素和可理解性的假设。最终,这项研究对对定量测量复杂的研究人员和数据分析师有意义,在准确预测经验现象的同时,无法观察到的结构。
    When investigating unobservable, complex traits, data collection and aggregation processes can introduce distinctive features to the data such as boundedness, measurement error, clustering, outliers, and heteroscedasticity. Failure to collectively address these features can result in statistical challenges that prevent the investigation of hypotheses regarding these traits. This study aimed to demonstrate the efficacy of the Bayesian beta-proportion generalized linear latent and mixed model (beta-proportion GLLAMM) (Rabe-Hesketh et al., Psychometrika, 69(2), 167-90, 2004a, Journal of Econometrics, 128(2), 301-23, 2004c, 2004b; Skrondal and Rabe-Hesketh 2004) in handling data features when exploring research hypotheses concerning speech intelligibility. To achieve this objective, the study reexamined data from transcriptions of spontaneous speech samples initially collected by Boonen et al. (Journal of Child Language, 50(1), 78-103, 2023). The data were aggregated into entropy scores. The research compared the prediction accuracy of the beta-proportion GLLAMM with the normal linear mixed model (LMM) (Holmes et al., 2019) and investigated its capacity to estimate a latent intelligibility from entropy scores. The study also illustrated how hypotheses concerning the impact of speaker-related factors on intelligibility can be explored with the proposed model. The beta-proportion GLLAMM was not free of challenges; its implementation required formulating assumptions about the data-generating process and knowledge of probabilistic programming languages, both central to Bayesian methods. Nevertheless, results indicated the superiority of the model in predicting empirical phenomena over the normal LMM, and its ability to quantify a latent potential intelligibility. Additionally, the proposed model facilitated the exploration of hypotheses concerning speaker-related factors and intelligibility. Ultimately, this research has implications for researchers and data analysts interested in quantitatively measuring intricate, unobservable constructs while accurately predicting the empirical phenomena.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Meta分析是综合综合和定量评价循证医学多项临床研究结果的重要工具。在许多荟萃分析中,一些研究的特点可能与其他研究明显不同,这些偏僻的研究可能会产生偏见,并可能产生误导性的结果。在这篇文章中,我们使用基于密度功率散度的广义似然提供了有效的鲁棒统计推断方法。稳健推理方法旨在通过使用基于稳健准则的修正估计方程来调整异常值的影响。即使存在多个严重的有影响力的异常值。我们提供了强大的估计器,统计检验,和置信区间通过荟萃分析的固定效应和随机效应模型的广义似然。我们还评估了个别研究对稳健的总体估计的贡献率,这些估计表明如何调整边远研究的影响。通过对最近发表的两个系统综述的模拟和应用,我们证明,如果应用稳健推断方法,荟萃分析的总体结论和解释可以显著改变,并且只有常规推断方法可能产生误导性证据.建议将这些方法至少用作荟萃分析实践中的敏感性分析方法。我们还开发了一个R包,robustmeta,实现了健壮的推理方法。
    Meta-analysis is an essential tool to comprehensively synthesize and quantitatively evaluate results of multiple clinical studies in evidence-based medicine. In many meta-analyses, the characteristics of some studies might markedly differ from those of the others, and these outlying studies can generate biases and potentially yield misleading results. In this article, we provide effective robust statistical inference methods using generalized likelihoods based on the density power divergence. The robust inference methods are designed to adjust the influences of outliers through the use of modified estimating equations based on a robust criterion, even when multiple and serious influential outliers are present. We provide the robust estimators, statistical tests, and confidence intervals via the generalized likelihoods for the fixed-effect and random-effects models of meta-analysis. We also assess the contribution rates of individual studies to the robust overall estimators that indicate how the influences of outlying studies are adjusted. Through simulations and applications to two recently published systematic reviews, we demonstrate that the overall conclusions and interpretations of meta-analyses can be markedly changed if the robust inference methods are applied and that only the conventional inference methods might produce misleading evidence. These methods would be recommended to be used at least as a sensitivity analysis method in the practice of meta-analysis. We have also developed an R package, robustmeta, that implements the robust inference methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在现实生活中,我们必须分析包含非典型观察的数据,异常值的存在对普通最小二乘估计的性能有不利影响。在这种情况下,退化M估计器,Huber(1964)用于解决异常值的影响,以提高最小二乘估计的效率。在这项研究中,我们引入了一种降序M估计器,旨在通过减轻异常观测值的影响来生成鲁棒估计,即使调谐常数设置为低值。这种创新的估计器在其核心表现出增强的线性度,并在整个范围内保持连续性。我们提出的估计器因其新颖性而脱颖而出,简单,可微性,以及在现实世界场景中的实际适用性。使用广泛的模拟研究将所提出的重降法M估计器的结果与现有的鲁棒估计器进行比较。还添加了两个基于实际数据的示例,以验证建议函数的性能。与所有考虑的重降M估计器相比,公式化的重降M估计器产生了有效的结果。
    In real-life situations, we have to analyze the data that contains the atypical observations, and the presence of outliers has adverse effects on the performance of ordinary least square estimates. In this situation, redescedning M-estimators, proposed by Huber (1964), are used to tackle the effects of outliers to increase the efficiency of least square estimates. In this study, we introduce a redescending M-estimator designed to generate robust estimates by mitigating the influence of outlier observations, even when the tuning constant is set to low values. This innovative estimator exhibits enhanced linearity at its core and maintains continuity throughout its range. Our proposed estimator stands out for its novelty, simplicity, differentiability, and practical applicability across real-world scenarios. The results of the proposed redescedning M-estimators are compared with existing robust estimators using an extensive simulation study. Two examples based on real-life data are also added to validate the performance of the suggested function. The formulated redescedning M-estimator produced efficient results as compared to all the considered redescedning M-estimators.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    大多数反应时间(RT)研究中的一个方法学问题是,一些测量的RT可能是异常值-也就是说,由于与感兴趣的任务相关处理无关的原因,它们可能非常快或非常慢。已经提出了许多临时方法来区分这些异常值和感兴趣的有效RT,但是要确定这些方法在实践中的效果是非常困难的,因为实际上对真实RT数据集中异常值的实际特征一无所知。本文提出了一种汇集累积分布函数值的新方法,用于检查经验RT分布,以评估异常值的比例及其相对于有效RT的延迟。随着方法的发展,使用基于先前建议的特殊RT异常值模型的模拟来检查其优缺点,这些模型具有特定的假定比例和有效RT和异常值的分布。然后将该方法应用于来自词汇决策任务的几个大型RT数据集,结果提供了对离群值RT的第一个基于经验的描述。对于这些数据集,不到1%的RT似乎是异常值,并且中位数异常值延迟似乎是RT的大约4-6个标准偏差,高于有效RT分布的平均值。
    A methodological problem in most reaction time (RT) studies is that some measured RTs may be outliers-that is, they may be very fast or very slow for reasons unconnected to the task-related processing of interest. Numerous ad hoc methods have been suggested to discriminate between such outliers and the valid RTs of interest, but it is extremely difficult to determine how well these methods work in practice because virtually nothing is known about the actual characteristics of outliers in real RT datasets. This article proposes a new method of pooling cumulative distribution function values for examining empirical RT distributions to assess both the proportions of outliers and their latencies relative to those of the valid RTs. As the method is developed, its strengths and weaknesses are examined using simulations based on previously suggested ad hoc models for RT outliers with particular assumed proportions and distributions of valid RTs and outliers. The method is then applied to several large RT datasets from lexical decision tasks, and the results provide the first empirically based description of outlier RTs. For these datasets, fewer than 1% of the RTs seem to be outliers, and the median outlier latency appears to be approximately 4-6 standard deviations of RT above the mean of the valid RT distribution.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    血管紧张素转换酶(ACE)代谢许多参与血压调节和血管重塑的重要肽。组织中ACE表达升高(通常由血液ACE水平反映)与心血管疾病的风险增加有关。血液ACE升高也是肉芽肿性疾病的标志物。血液ACE活性降低正在成为阿尔茨海默病的新危险因素。我们应用了我们的新方法-ACE表型-来表征组织对(肺,心,淋巴结)和50例患者的血清ACE。ACE表型分析包括(1)用两种底物(ZPHL和HHL)测量ACE活性;(2)计算这些底物的水解比率(ZPHL/HHL比率);(3)使用mAb对ACE测定ACE免疫反应性蛋白水平;和(4)具有一组mAb对ACE的ACE构象。筛选格式的ACE表型方法,特别注意异常值,结合测序数据分析,使我们能够鉴定具有独特ACE表型的患者,该表型与白蛋白抑制ACE活性的能力降低有关,可能是由于该患者与高CCL18竞争结合ACE。我们还证实了最近发现的一些ACE糖基化位点的唾液酸化存在性别差异。ACE表型分析是鉴定具有潜在临床意义的ACE表型异常值的一种有前途的新方法。使其可用于个性化医疗方法的筛查。
    Angiotensin-converting enzyme (ACE) metabolizes a number of important peptides participating in blood pressure regulation and vascular remodeling. Elevated ACE expression in tissues (which is generally reflected by blood ACE levels) is associated with an increased risk of cardiovascular diseases. Elevated blood ACE is also a marker for granulomatous diseases. Decreased blood ACE activity is becoming a new risk factor for Alzheimer\'s disease. We applied our novel approach-ACE phenotyping-to characterize pairs of tissues (lung, heart, lymph nodes) and serum ACE in 50 patients. ACE phenotyping includes (1) measurement of ACE activity with two substrates (ZPHL and HHL); (2) calculation of the ratio of hydrolysis of these substrates (ZPHL/HHL ratio); (3) determination of ACE immunoreactive protein levels using mAbs to ACE; and (4) ACE conformation with a set of mAbs to ACE. The ACE phenotyping approach in screening format with special attention to outliers, combined with analysis of sequencing data, allowed us to identify patient with a unique ACE phenotype related to decreased ability of inhibition of ACE activity by albumin, likely due to competition with high CCL18 in this patient for binding to ACE. We also confirmed recently discovered gender differences in sialylation of some glycosylation sites of ACE. ACE phenotyping is a promising new approach for the identification of ACE phenotype outliers with potential clinical significance, making it useful for screening in a personalized medicine approach.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号