Factor models

因子模型
  • 文章类型: Journal Article
    本研究通过借鉴因子模型的大领域,开发了一种基于模型的索引方法,称为广义共享分量模型(GSCM)。所提出的完全贝叶斯方法适应了异方差模型误差,多个共享因素和灵活的空间先验。此外,与以前的索引方法不同,我们的模型提供了具有不确定性的指数。关注增加癌症风险的不健康行为,拟议的GSCM用于开发影响癌症行为的区域指数产品-代表澳大利亚第一个区域级别的癌症危险因素指数。这一进步有助于识别癌症风险升高的社区,促进有针对性的健康干预。
    This study develops a model-based index approach called the Generalised Shared Component Model (GSCM) by drawing on the large field of factor models. The proposed fully Bayesian approach accommodates heteroscedastic model error, multiple shared factors and flexible spatial priors. Moreover, unlike previous index approaches, our model provides indices with uncertainty. Focusing on unhealthy behaviors that increase the risk of cancer, the proposed GSCM is used to develop the Area Indices of Behaviors Impacting Cancer product - representing the first area level cancer risk factor index in Australia. This advancement aids in identifying communities with elevated cancer risk, facilitating targeted health interventions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    阐明一般认知能力(GCA)的神经机制是认知神经科学的重要任务。最近的大样本队列研究通过多种认知任务测量GCA,并探索其神经基础,但是他们没有调查任务数量,因子模型,和神经数据类型影响GCA及其神经相关性的估计。为了解决这些问题,我们对1,605名中国年轻人进行了19项认知任务和Raven的高级渐进式矩阵(RAPM)的测试,并从683名个体的子样本中收集了静息状态和n-back任务fMRI数据。结果表明,通过多个任务可以可靠地估计GCA。增加任务数可以增强GCA估计的可靠性和有效性,并可靠地增强其与大脑数据的相关性。Spearman模型和分层双因子模型产生相似的GCA估计。与Spearman模型相比,双因子模型具有更好的模型拟合度和与RAPM的更强相关性,但解释的方差较小,并且与大脑数据的相关性较弱。值得注意的是,基于n-back任务的功能连接模式在预测GCA方面优于静息状态fMRI。这些结果表明,源自多种认知任务的GCA可作为一般智力的有效量度,并且其神经相关性可以通过任务fMRI比静息状态fMRI数据更好地表征。
    Elucidating the neural mechanisms of general cognitive ability (GCA) is an important mission of cognitive neuroscience. Recent large-sample cohort studies measured GCA through multiple cognitive tasks and explored its neural basis, but they did not investigate how task number, factor models, and neural data type affect the estimation of GCA and its neural correlates. To address these issues, we tested 1,605 Chinese young adults with 19 cognitive tasks and Raven\'s Advanced Progressive Matrices (RAPM) and collected resting state and n-back task fMRI data from a subsample of 683 individuals. Results showed that GCA could be reliably estimated by multiple tasks. Increasing task number enhances both reliability and validity of GCA estimates and reliably strengthens their correlations with brain data. The Spearman model and hierarchical bifactor model yield similar GCA estimates. The bifactor model has better model fit and stronger correlation with RAPM but explains less variance and shows weaker correlations with brain data than does the Spearman model. Notably, the n-back task-based functional connectivity patterns outperform resting-state fMRI in predicting GCA. These results suggest that GCA derived from a multitude of cognitive tasks serves as a valid measure of general intelligence and that its neural correlates could be better characterized by task fMRI than resting-state fMRI data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在现代科学研究中,由于复杂数据的丰富,通常观察到数据异质性。我们提出了具有异质亚群的数据的因子回归模型。所提出的模型可以表示为异质和同质项的分解。异质项是由不同亚群中的潜在因素驱动的。齐次项捕获协变量中的共同变化,并在亚群中共享共同的回归系数。我们提出的模型在全局模型和特定组模型之间取得了良好的平衡。全局模型忽略了数据的异质性,而特定组的模型分别适合每个子组。我们证明了我们提出的估计量的估计和预测一致性,并表明它比特定群体和全局模型具有更好的收敛速度。我们表明,估计潜在因素的额外成本可以渐近地忽略不计,并且仍然可以实现最小二乘。我们通过研究在错误指定的特定组模型下的预测误差,进一步证明了我们提出的方法的鲁棒性。最后,我们进行了模拟研究,并分析了来自阿尔茨海默病神经影像学计划的数据集和汇总的微阵列数据集,以进一步证明我们提出的因子回归模型的竞争力和可解释性.
    In modern scientific research, data heterogeneity is commonly observed owing to the abundance of complex data. We propose a factor regression model for data with heterogeneous subpopulations. The proposed model can be represented as a decomposition of heterogeneous and homogeneous terms. The heterogeneous term is driven by latent factors in different subpopulations. The homogeneous term captures common variation in the covariates and shares common regression coefficients across subpopulations. Our proposed model attains a good balance between a global model and a group-specific model. The global model ignores the data heterogeneity, while the group-specific model fits each subgroup separately. We prove the estimation and prediction consistency for our proposed estimators, and show that it has better convergence rates than those of the group-specific and global models. We show that the extra cost of estimating latent factors is asymptotically negligible and the minimax rate is still attainable. We further demonstrate the robustness of our proposed method by studying its prediction error under a mis-specified group-specific model. Finally, we conduct simulation studies and analyze a data set from the Alzheimer\'s Disease Neuroimaging Initiative and an aggregated microarray data set to further demonstrate the competitiveness and interpretability of our proposed factor regression model.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    最近的技术进步使得在生物医学研究中测量多种类型的许多特征成为可能。然而,由于成本或其他限制,某些数据类型或特征可能无法针对所有研究对象进行测量。我们使用潜在变量模型来表征数据类型之间和内部的关系,并从观察到的数据中推断缺失值。我们开发了一种用于变量选择和参数估计的惩罚似然方法,并设计了一种有效的期望最大化算法来实现我们的方法。当特征数以样本大小的多项式速率增加时,我们建立了所提出的估计器的渐近性质。最后,我们使用广泛的模拟研究证明了所提出的方法的有用性,并为激励多平台基因组学研究提供了应用。
    Recent technological advances have made it possible to measure multiple types of many features in biomedical studies. However, some data types or features may not be measured for all study subjects because of cost or other constraints. We use a latent variable model to characterize the relationships across and within data types and to infer missing values from observed data. We develop a penalized-likelihood approach for variable selection and parameter estimation and devise an efficient expectation-maximization algorithm to implement our approach. We establish the asymptotic properties of the proposed estimators when the number of features increases at a polynomial rate of the sample size. Finally, we demonstrate the usefulness of the proposed methods using extensive simulation studies and provide an application to a motivating multi-platform genomics study.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:因子分析是分子生物学中高通量数据集的无监督降维的广泛使用的工具,最近提出的专门为空间转录组学数据设计的扩展。然而,这些方法期望(计数)矩阵作为数据输入,因此不能直接适用于单分子分辨率数据,它们是用基因注释的坐标列表的形式,并提供对亚细胞空间表达模式的洞察。为了解决这个问题,我们在这里提出FISHFactor,结合了空间优势的概率因子模型,具有泊松点过程似然性的非负面因素分析,以明确建模并解释单分子分辨率数据的性质。此外,FISHFactor在公共权重矩阵中的潜在大量单元格中共享信息,允许对跨细胞的因素进行一致的解释,并产生改进的潜在变量估计。
    结果:我们将FISHFactor与现有的方法进行了比较,这些方法依赖于通过空间分级来聚合信息,并且无法将来自多个细胞的信息组合起来,并表明我们的方法可以在模拟数据上获得更准确的结果。我们证明了我们的方法是可扩展的,可以很容易地应用于大型数据集。最后,我们在真实数据集上证明,FISHOfactor能够以数据驱动的方式识别主要的亚细胞表达模式和空间基因簇。
    背景:模型实现,数据模拟和实验脚本可在https://www下获得。github.com/bioFAM/FISHFactor.
    背景:补充数据可在Bioinformatics在线获得。
    Factor analysis is a widely used tool for unsupervised dimensionality reduction of high-throughput datasets in molecular biology, with recently proposed extensions designed specifically for spatial transcriptomics data. However, these methods expect (count) matrices as data input and are therefore not directly applicable to single molecule resolution data, which are in the form of coordinate lists annotated with genes and provide insight into subcellular spatial expression patterns. To address this, we here propose FISHFactor, a probabilistic factor model that combines the benefits of spatial, non-negative factor analysis with a Poisson point process likelihood to explicitly model and account for the nature of single molecule resolution data. In addition, FISHFactor shares information across a potentially large number of cells in a common weight matrix, allowing consistent interpretation of factors across cells and yielding improved latent variable estimates.
    We compare FISHFactor to existing methods that rely on aggregating information through spatial binning and cannot combine information from multiple cells and show that our method leads to more accurate results on simulated data. We show that our method is scalable and can be readily applied to large datasets. Finally, we demonstrate on a real dataset that FISHFactor is able to identify major subcellular expression patterns and spatial gene clusters in a data-driven manner.
    The model implementation, data simulation and experiment scripts are available under https://www.github.com/bioFAM/FISHFactor.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究实证分析了2000年至2020年欧洲股票市场的时间序列动量(TSM)。该研究为TSM提供了更多证据,在TSM中,重大而持续的市场价格异常使投资者能够获得异常回报。为了实现这一目标,本研究实施了一个集合自回归模型来测试欧洲股票指数未来收益的可预测性。结果表明,基于TSM的策略与所讨论的文献一致,并使市场代理人能够通过使用六因素模型获得高于市场的回报(每月0.71%)。
    This study empirically analyzes time series momentum (TSM) in the European equity market between 2000 & 2020. The study produces additional evidence on TSM where a significant and persistent market price anomaly enables investors to earn abnormal returns. To achieve this goal the present study implements a pooled autoregressive model to test the predictability power of European equity indices of future returns. The results indicate that strategies based on TSM are in line with the discussed literature and enable market agents to earn returns above the market (0.71% per month) by using a six-factor model.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    神经活动通常用从许多神经元的反应中提取的群体水平因子来描述。因素提供了低维描述,目的是在网络计算上发光。然而,机械上,计算不是由连续值的因素执行的,而是由离散和可变地尖峰的神经元之间的相互作用执行的。模型提供了一种桥接这些描述级别的方法。我们通过利用从数据或基于触发率的网络中提取的因素,开发了一种用于训练尖峰神经元模型网络的通用方法。除了提供有用的模型构建框架之外,这种形式主义说明了看似随机的尖峰如何产生可靠和连续的价值因素。我们的框架建立了将此属性嵌入具有不同真实感的网络模型中的程序。此类网络中的尖峰与因子之间的关系为解释(并巧妙地重新定义)常用的数量(例如点火率)提供了基础。
    Neural activity is often described in terms of population-level factors extracted from the responses of many neurons. Factors provide a lower-dimensional description with the aim of shedding light on network computations. Yet, mechanistically, computations are performed not by continuously valued factors but by interactions among neurons that spike discretely and variably. Models provide a means of bridging these levels of description. We developed a general method for training model networks of spiking neurons by leveraging factors extracted from either data or firing-rate-based networks. In addition to providing a useful model-building framework, this formalism illustrates how reliable and continuously valued factors can arise from seemingly stochastic spiking. Our framework establishes procedures for embedding this property in network models with different levels of realism. The relationship between spikes and factors in such networks provides a foundation for interpreting (and subtly redefining) commonly used quantities such as firing rates.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们演示了如何使用具有潜在变量的线性因子模型来估计临床试验结果之间的相关性。药物/疫苗开发的许多政策问题(例如计算财务激励措施的最佳规模)都需要这些相关性,迄今为止的文献都依赖于专家意见。我们将我们的方法应用于疫苗的情况,并表明估计的相关性是非常显著的。我们还说明了如何使用估计的相关性来找出从一定数量的候选物中获得成功疫苗的可能性,并确定疫苗开发的最佳投资。
    We demonstrate how a linear factor model with latent variables can be used to estimate correlations between the outcomes of clinical trials. These correlations are needed for many policy questions of drug/vaccine development (such as calculating the optimal size of financial incentives) and the literature so far has relied on expert opinions. We apply our methodology to the case of vaccines and show that the estimated correlations are highly significant. We also illustrate how the estimated correlations can be used to find the probability of obtaining a successful vaccine out of a certain number of candidates and to determine optimal investment in vaccine development.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    目前神经系统疾病的诊断通常依赖于晚期临床症状,这对在清单前阶段制定有效的干预措施构成了障碍。最近的研究表明,生物标志物和临床标志物的细微变化可能以时间有序的方式发生,可以用作早期疾病的指标。在这篇文章中,我们应对利用多领域标志物来了解神经系统疾病的早期疾病进展的挑战。我们建议集成来自多个领域的异构类型的度量(例如,离散的临床症状,序数认知标记,连续神经成像,和血液生物标志物)使用分层多层指数家庭因子(MEFF)模型,其中观测值遵循具有低维潜在因子的指数族分布。潜在因素被分解为跨多个域的共享因素和域特定因素,其中共享因素提供了可靠的信息来进行广泛的表型分析,并将患者分为有临床意义和生物学同质的亚组。域特定因子捕获每个域的剩余独特变化。MEFF模型还捕获疾病进展的非线性轨迹,并对每个标记物测量的神经变性的关键事件进行排序。为了克服计算挑战,我们通过大规模数据的近似推理技术来拟合我们的模型。我们将开发的方法应用于帕金森病进展标志物倡议数据,以整合生物,临床,和异质性分布产生的认知标记。该模型学习帕金森病(PD)的低维表示和PD神经变性的时间顺序。
    Current diagnosis of neurological disorders often relies on late-stage clinical symptoms, which poses barriers to developing effective interventions at the premanifest stage. Recent research suggests that biomarkers and subtle changes in clinical markers may occur in a time-ordered fashion and can be used as indicators of early disease. In this article, we tackle the challenges to leverage multidomain markers to learn early disease progression of neurological disorders. We propose to integrate heterogeneous types of measures from multiple domains (e.g., discrete clinical symptoms, ordinal cognitive markers, continuous neuroimaging, and blood biomarkers) using a hierarchical Multilayer Exponential Family Factor (MEFF) model, where the observations follow exponential family distributions with lower-dimensional latent factors. The latent factors are decomposed into shared factors across multiple domains and domain-specific factors, where the shared factors provide robust information to perform extensive phenotyping and partition patients into clinically meaningful and biologically homogeneous subgroups. Domain-specific factors capture remaining unique variations for each domain. The MEFF model also captures nonlinear trajectory of disease progression and orders critical events of neurodegeneration measured by each marker. To overcome computational challenges, we fit our model by approximate inference techniques for large-scale data. We apply the developed method to Parkinson\'s Progression Markers Initiative data to integrate biological, clinical, and cognitive markers arising from heterogeneous distributions. The model learns lower-dimensional representations of Parkinson\'s disease (PD) and the temporal ordering of the neurodegeneration of PD.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本文通过提供有关经合组织国家之间社会支出共同行为的信息,扩大了对社会支出周期性特征的分析。为此,我们建议使用动态因子分析和递归模型来从广泛的角度估计社会政策的同步性和周期性。通过考虑社会支出的同步性,可以评估联合应对经济周期变化的短期特征。我们发现,只有发达经济体才能实现社会支出的同步,在全球金融危机期间实现最高的反周期稳定效果。新兴市场经济体无法加入同步应对,保持独立,在大多数情况下,他们的社会政策行为中的顺周期立场。
    在线版本包含补充材料,可在10.1007/s10663-022-09545-w获得。
    This paper expands the analysis of the cyclical characteristics of social spending by providing information on its joint behaviour across OECD countries. With this aim we propose the use of dynamic factor analysis and recursive models to estimate synchronization and cyclicality of social policies within a broad perspective. By considering the synchronization of social spending it is possible to assess the short-run characteristics of the joint response to changes in the economic cycle. We find that synchronization of social spending was only possible for advanced economies, achieving the highest countercyclical stabilization effect during the Global Financial Crisis. Emerging market economies are not able to join the synchronized response, maintaining independent and, in most cases, procyclical stances in the behaviour of their social policies.
    UNASSIGNED: The online version contains supplementary material available at 10.1007/s10663-022-09545-w.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号