latent variable

潜在变量
  • 文章类型: Journal Article
    对于涉及copulas的多元非高斯,似然推断由中间的数据主导,拟合模型对于联合尾部推断可能不是很好,例如评估尾部依赖的强度。当初步数据和似然分析表明不对称尾部依赖时,提出了一种基于联合上下尾改进极值推断的方法。使用先前关于尾部依赖性的信息的先验可以与可能性结合使用。结合先验和似然性(在实践中存在一定程度的错误指定)来获得倾斜的对数似然性,具有适当变换参数的推断可以基于贝叶斯计算方法或倾斜对数似然的数值优化,以获得后验模式和该模式下的Hessian。
    For multivariate non-Gaussian involving copulas, likelihood inference is dominated by the data in the middle, and fitted models might not be very good for joint tail inference, such as assessing the strength of tail dependence. When preliminary data and likelihood analysis suggest asymmetric tail dependence, a method is proposed to improve extreme value inferences based on the joint lower and upper tails. A prior that uses previous information on tail dependence can be used in combination with the likelihood. With the combination of the prior and the likelihood (which in practice has some degree of misspecification) to obtain a tilted log-likelihood, inferences with suitably transformed parameters can be based on Bayesian computing methods or with numerical optimization of the tilted log-likelihood to obtain the posterior mode and Hessian at this mode.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    测量误差在实践中非常常见。拟合模型后,影响诊断是统计数据分析的重要步骤。测量误差模型最常用的诊断方法是局部影响。然而,这种方法可能无法检测到被掩盖的有影响力的观测。为了克服这个限制,我们建议使用共形法线曲率与前向搜索算法。结果是通过考虑不同扰动方案的易于解释的图给出的。用三个真实数据集和一个模拟数据集说明了所提出的方法,其中两个之前已经在文献中进行了分析。第三数据集涉及制药过程中吸湿性固体剂量的稳定性,以确保维持产品安全质量。在这个应用中,分析质量天平存在测量误差,这在建模过程和诊断分析中需要注意。
    Measurement errors occur very commonly in practice. After fitting the model, influence diagnostics is an important step in statistical data analysis. The most frequently used diagnostic method for measurement error models is the local influence. However, this methodology may fail to detect masked influential observations. To overcome this limitation, we propose the use of the conformal normal curvature with the forward search algorithm. The results are presented through easy to interpret plots considering different perturbation schemes. The proposed methodology is illustrated with three real data sets and one simulated data set, two of which have been previously analyzed in the literature. The third data set deals with the stability of the hygroscopic solid dosage in pharmaceutical processes to ensure the maintenance of product safety quality. In this application, the analytical mass balance is subject to measurement errors, which require attention in the modeling process and diagnostic analysis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    过程状态的估计和可视化对于化学和工业工厂的过程控制很重要。由于工业过程理论上与高斯分布有关,本研究的重点是高斯过程潜在变量模型。基于贝叶斯高斯过程潜变量模型(BGPLVM),利用两个潜变量提出了过程状态估计和可视化方法,无限扭曲混合模型(iWMM),和高斯过程动力学模型(GPDM)。对田纳西州伊士曼工艺数据集进行了分析,并证实了估计工艺状态的性能在GPDM的顺序中是最高的,iWMM,和BGPLVM。此外,将时间延迟的过程变量添加到过程变量中以考虑过程动态,这进一步提高了过程状态估计的性能。特别是在GPDM的情况下,只有两个潜在变量可以估计过程状态,四个过程状态的精度约为100%。此外,甚至可以估计10个过程状态,准确率约为90%,可以同时实现过程状态估计和过程状态可视化。
    The estimation and visualization of process states are important for process control in chemical and industrial plants. Since industrial processes are related to Gaussian distributions theoretically, this study focused on Gaussian process latent variable models. Process state estimation and visualization methods are proposed using two latent variables based on the Bayesian Gaussian process latent variable model (BGPLVM), infinite warped mixture model (iWMM), and Gaussian process dynamical models (GPDM). The Tennessee Eastman process dataset was analyzed and it was confirmed that the performance of estimating the process states was highest in the order of GPDM, iWMM, and BGPLVM. Moreover, time-delayed process variables were added to the process variables to consider the process dynamics, which further improved the performance of estimating the process states. Particularly in the case of GPDM, only two latent variables could estimate the process states, with approximately 100% accuracy for four process states. Additionally, even 10 process states could be estimated with approximately 90% accuracy, and it was confirmed that the process state estimation and process state visualization could be achieved simultaneously.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    评级量表设计中隐含的假设是,类别反映了潜在变量水平的增加。订购的多体物品的Rasch模型包括参数,称为阈值,这允许对这一假设进行实证检验。阈值未能随类别单调前进(称为“阈值混乱”的条件)提供了证据,表明评级量表未按预期运行。这项工作的重点是由相当多的类别组成的尺度,在文献中经常推荐使用。在专门为患者健康问卷-9开发的扩展8点量表和行为宗教信仰量表的原始10点量表中都观察到阈值混乱。这项工作的结果促使从业者不要把评分表的功能视为理所当然,而是根据经验来验证。
    The hypothesis implicit in the rating scale design is that the categories reflect increasing levels of the latent variable. Rasch models for ordered polytomous items include parameters, called thresholds, that allow for empirically testing this hypothesis. Failure of the thresholds to advance monotonically with the categories (a condition that is referred to as \"threshold disordering\") provides evidence that the rating scale is not functioning as intended. This work focuses on scales consisting of rather large numbers of categories, whose use is often recommended in the literature. Threshold disordering is observed in both an extended 8-point scale specially developed for the Patient Health Questionnaire-9 and the original 10-point scale of the Behavioral Religiosity Scale. The results of this work prompt practitioners not to take the functioning of the rating scale for granted, but to verify it empirically.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    心理测试的总分是,而且应该继续,心理测量实践中的核心工具。这个立场与一些心理测量学家的信念背道而驰,即总和分数代表了一个科学前的概念,必须从心理测量学中放弃,而倾向于潜在变量。首先,我们重申,在广泛使用的项目响应模型中,总和得分对潜在变量进行随机排序。事实上,项目反应理论为总和得分的顺序使用提供了基于数学的理由。第二,因为关于总分的讨论通常也涉及其可靠性和估计方法,我们证明,基于非常一般的假设,经典测试理论提供了一个下界族,其中几个在合理条件下接近真实可靠性。最后,我们认为,最终总和分数的价值来自于它们能够预测实际相关事件和行为的程度。我们的讨论都不是要抹黑现代测量模型;它们有经典测试理论无法达到的优点,但是后一种模式为心理测量提供了令人印象深刻的贡献,这种贡献基于很少的假设,这些假设在过去的几十年里似乎变得模糊了。它们的普遍性和实际实用性增加了最近方法的成就。
    The sum score on a psychological test is, and should continue to be, a tool central in psychometric practice. This position runs counter to several psychometricians\' belief that the sum score represents a pre-scientific conception that must be abandoned from psychometrics in favor of latent variables. First, we reiterate that the sum score stochastically orders the latent variable in a wide variety of much-used item response models. In fact, item response theory provides a mathematically based justification for the ordinal use of the sum score. Second, because discussions about the sum score often involve its reliability and estimation methods as well, we show that, based on very general assumptions, classical test theory provides a family of lower bounds several of which are close to the true reliability under reasonable conditions. Finally, we argue that eventually sum scores derive their value from the degree to which they enable predicting practically relevant events and behaviors. None of our discussion is meant to discredit modern measurement models; they have their own merits unattainable for classical test theory, but the latter model provides impressive contributions to psychometrics based on very few assumptions that seem to have become obscured in the past few decades. Their generality and practical usefulness add to the accomplishments of more recent approaches.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    2015年,美国联邦政府利用伊基托斯的历史案例数据赞助了一场登革热预测竞赛,秘鲁和圣胡安,波多黎各。竞争对手在样本外预测的几个方面进行了评估,包括高峰周的目标,那一周的峰值发病率,以及几个季节中每个季节的总季节发生率。我们队是比赛的冠军之一,在多个目标/区域设置中表现优于其他团队。在本文中,我们报告了我们的方法论,其中的一个很大的组成部分,令人惊讶的是,忽略了已知的流行病生物学,例如,登革热传播与环境因素之间的关系-而是依赖于灵活的非参数非线性高斯过程(GP)回归拟合,“记忆”过去季节的轨迹,然后“匹配”正在展开的季节的动态与过去的实时。我们的现象学方法在疾病动力学不太了解的情况下具有优势,或者在辅助协变量如降水的测量和预测不可用的地方,和/或与案件的关联强度尚不清楚。特别是,我们表明,GP方法通常优于更经典的广义线性(自回归)模型(GLM),我们开发利用丰富的协变量信息。我们在两个基准区域中说明了我们方法的变体,以及其他竞赛竞争对手提交的结果的完整摘要。
    In 2015 the US federal government sponsored a dengue forecasting competition using historical case data from Iquitos, Peru and San Juan, Puerto Rico. Competitors were evaluated on several aspects of out-of-sample forecasts including the targets of peak week, peak incidence during that week, and total season incidence across each of several seasons. our team was one of the winners of that competition, outperforming other teams in multiple targets/locales. In this paper we report on our methodology, a large component of which, surprisingly, ignores the known biology of epidemics at large-for example, relationships between dengue transmission and environmental factors-and instead relies on flexible nonparametric nonlinear Gaussian process (GP) regression fits that \"memorize\" the trajectories of past seasons, and then \"match\" the dynamics of the unfolding season to past ones in real-time. Our phenomenological approach has advantages in situations where disease dynamics are less well understood, or where measurements and forecasts of ancillary covariates like precipitation are unavailable, and/or where the strength of association with cases are as yet unknown. In particular, we show that the GP approach generally outperforms a more classical generalized linear (autoregressive) model (GLM) that we developed to utilize abundant covariate information. We illustrate variations of our method(s) on the two benchmark locales alongside a full summary of results submitted by other contest competitors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    考虑到功能数据分析的背景,我们通过Gibbs采样器开发并应用了一种新的贝叶斯方法,以选择用于有限表示函数数据的基函数。所提出的方法使用伯努利潜在变量将具有正概率的某些基函数系数分配为零。该过程允许自适应基础选择,因为它可以确定基础的数量以及应该选择哪些来表示功能数据。此外,所提出的程序测量选择过程的不确定性,可以同时应用于多条曲线。开发的方法可以处理由于实验误差和受试者之间的随机个体差异而可能不同的观察曲线,可以在涉及巴西每日COVID-19病例数的真实数据集应用程序中观察到。仿真研究表明了所提出方法的主要性质,例如,它在估计系数方面的准确性以及找到真正的基函数集的过程的强度。尽管是在功能数据分析的背景下开发的,我们还通过仿真将提出的模型与完善的LASSO和贝叶斯LASSO进行了比较,这是针对非功能性数据开发的方法。
    Considering the context of functional data analysis, we developed and applied a new Bayesian approach via the Gibbs sampler to select basis functions for a finite representation of functional data. The proposed methodology uses Bernoulli latent variables to assign zero to some of the basis function coefficients with a positive probability. This procedure allows for an adaptive basis selection since it can determine the number of bases and which ones should be selected to represent functional data. Moreover, the proposed procedure measures the uncertainty of the selection process and can be applied to multiple curves simultaneously. The methodology developed can deal with observed curves that may differ due to experimental error and random individual differences between subjects, which one can observe in a real dataset application involving daily numbers of COVID-19 cases in Brazil. Simulation studies show the main properties of the proposed method, such as its accuracy in estimating the coefficients and the strength of the procedure to find the true set of basis functions. Despite having been developed in the context of functional data analysis, we also compared the proposed model via simulation with the well-established LASSO and Bayesian LASSO, which are methods developed for non-functional data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    评估疾病生物标志物与临床结果之间的相关性在生物医学研究中至关重要。在许多慢性疾病的早期阶段,生物标志物和临床结果的变化通常是微妙的.检测微妙相关性的一个主要挑战是,通常需要大样本量的研究才能获得足够的统计能力。当使用生物流体和成像生物标志物数据时,这一挑战甚至更大,因为所需的程序繁重,被认为是侵入性的,和/或昂贵,在个别研究中限制样本量。将多个研究中的数据组合起来可能会增加统计能力,但是生物标志物数据可以使用不同的测定平台生成,扫描仪类型,或处理协议,这可能会影响测量的生物标志物值。因此,协调生物标志物数据对于整合研究中的数据至关重要。桥接研究涉及样品子集的再处理或成像扫描,以评估生物标志物值如何随研究而变化。这对如何在研究中最好地协调生物标志物数据提出了分析挑战,以允许对其与标准化临床结果的相关性进行无偏和最佳估计。我们概念化认为潜在的生物标志物是整个研究中观察到的生物标志物的基础,并提出了一种新颖的方法,该方法将桥接研究中的数据与研究特定的生物标志物数据相结合,以估计生物标志物与临床结果之间的生物学相关性。通过广泛的模拟,我们将我们的方法与几种常用于估计相关性的替代方法/算法进行比较。最后,我们展示了该方法在真实世界的多中心阿尔茨海默病生物标志物研究中的应用,该研究将脑脊液生物标志物浓度与认知结局相关联.
    Evaluating correlations between disease biomarkers and clinical outcomes is crucial in biomedical research. During the early stages of many chronic diseases, changes in biomarkers and clinical outcomes are often subtle. A major challenge to detecting subtle correlations is that studies with large sample sizes are usually needed to achieve sufficient statistical power. This challenge is even greater when biofluid and imaging biomarker data are used because the required procedures are burdensome, perceived as invasive, and/or expensive, limiting sample sizes in individual studies. Combining data across multiple studies may increase statistical power, but biomarker data may be generated using different assay platforms, scanner types, or processing protocols, which may affect measured biomarker values. Therefore, harmonizing biomarker data is essential to combining data across studies. Bridging studies involve re-processing of a subset of samples or imaging scans to evaluate how biomarker values vary by studies. This presents an analytic challenge on how to best harmonize biomarker data across studies to allow unbiased and optimal estimates of their correlations with standardized clinical outcomes. We conceptualize that a latent biomarker underlies the observed biomarkers across studies, and propose a novel approach that integrates the data in the bridging study with the study-specific biomarker data for estimating the biological correlations between biomarkers and clinical outcomes. Through extensive simulations, we compare our method to several alternative methods/algorithms often used to estimate the correlations. Finally, we demonstrate the application of this methodology to a real-world multi-center Alzheimer\'s disease biomarker study to correlate cerebrospinal fluid biomarker concentrations with cognitive outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    最近的方法旨在代表精神病理学的维度结构,但是相对较少的研究在内化精神病理学中严格测试了子维度。这项研究测试了内化精神病理学的维度结构的预注册模型,以及它们与当前和终生抑郁症和焦虑症诊断数据的关系,在三个地点协调的成人样本中(n=427)。在S-1双因子和分层模型中,我们发现了一般和特定内化维度的汇聚证据。抑郁症,广泛性焦虑症(GAD),社交焦虑障碍(SAD),和惊恐发作都与一般内在化因素有关,我们认为这主要代表动机性快感缺失。GAD还与特定的焦虑忧虑因素有关,和SAD具有特定的焦虑忧虑和低积极影响因素。我们建议,捕获共享和特定内化症状方面的维度方法更准确地描述内化心理病理学的结构,并为分类诊断提供有用的替代方案,以推进临床科学。
    Recent approaches aim to represent the dimensional structure of psychopathology, but relatively little research has rigorously tested sub-dimensions within internalizing psychopathology. This study tests pre-registered models of the dimensional structure of internalizing psychopathology, and their relations with current and lifetime depressive and anxiety disorders diagnostic data, in adult samples harmonized across three sites (n=427). Across S-1 bifactor and hierarchical models, we found converging evidence for both general and specific internalizing dimensions. Depression, generalized anxiety disorder (GAD), social anxiety disorder (SAD), and panic attacks were all associated with a general internalizing factor that we posit primarily represents motivational anhedonia. GAD was also associated with a specific anxious apprehension factor, and SAD with specific anxious apprehension and low positive affect factors. We suggest that dimensional approaches capturing shared and specific internalizing symptom facets more accurately describe the structure of internalizing psychopathology and provide useful alternatives to categorical diagnoses to advance clinical science.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    来自基于计算机的问题解决项目的响应过程数据将受访者的问题解决过程描述为动作序列。这些数据为理解受访者的解决问题的行为提供了宝贵的来源。最近,数据驱动的特征提取方法已经被开发出来,将非结构化过程数据中的信息压缩成相对低维的特征。尽管提取的特征可以用作回归或其他模型中的协变量来理解受访者的反应行为,结果往往不容易解释,因为提取的特征之间的关系,并且原始响应过程通常没有明确定义。在本文中,我们提出了一个统计模型来描述响应过程以及它们在受访者之间的差异。所提出的模型假设响应过程遵循一个隐马尔可夫模型,给出了受访者的潜在特征。隐马尔可夫模型的结构类似于解决问题的过程,隐藏状态被解释为解决问题的子任务或阶段。将潜在特征结合到隐马尔可夫模型中,使我们能够以简约和可解释的方式描述受访者之间反应过程的异质性。我们通过仿真实验和PISA过程数据的案例研究证明了所提出模型的性能。
    Response process data from computer-based problem-solving items describe respondents\' problem-solving processes as sequences of actions. Such data provide a valuable source for understanding respondents\' problem-solving behaviors. Recently, data-driven feature extraction methods have been developed to compress the information in unstructured process data into relatively low-dimensional features. Although the extracted features can be used as covariates in regression or other models to understand respondents\' response behaviors, the results are often not easy to interpret since the relationship between the extracted features, and the original response process is often not explicitly defined. In this paper, we propose a statistical model for describing response processes and how they vary across respondents. The proposed model assumes a response process follows a hidden Markov model given the respondent\'s latent traits. The structure of hidden Markov models resembles problem-solving processes, with the hidden states interpreted as problem-solving subtasks or stages. Incorporating the latent traits in hidden Markov models enables us to characterize the heterogeneity of response processes across respondents in a parsimonious and interpretable way. We demonstrate the performance of the proposed model through simulation experiments and case studies of PISA process data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号