Summary statistics

统计摘要
  • 文章类型: Journal Article
    社会经济地位和脆弱之间的关系在文献中得到了广泛的研究,但尚不清楚是否存在因果关系。我们的目标是使用来自欧洲血统个体的大型全基因组关联研究的单核苷酸多态性摘要水平数据,评估六个社会经济特征与脆弱指数之间的因果关系。
    进行双样品MR。我们将逆方差加权(IVW)方法应用于初级估计,使用替代MR方法进行敏感性分析,以评估结果的稳健性。随后进行多变量MR以调整体重指数(BMI)的影响。最后,进行MRSteiger方向性测试以确认因果方向.
    IVWMR分析揭示了各种社会经济因素与虚弱指数之间的显着关联。具体来说,遗传预测年龄完成全日制教育(β=-0.477,95%置信区间[CI]:-0.634至-0.319)和税前平均家庭总收入(β=-0.321,95%CI:-0.410至-0.232)与虚弱指数呈负相关。另一方面,基因预测的工作涉及繁重的体力劳动(β=0.298,95%CI:0.113至0.484),工作主要涉及步行或站立(β=0.179,95%CI:0.013至0.345),招募时的汤森德剥夺指数(β=0.535,95%CI:0.285至0.785),社交隔离/孤独感(β=1.344,95%CI:0.834至1.853)与虚弱指数呈正相关。使用其他MR方法的敏感性分析和针对BMI进行调整的多变量MR分析产生了稳定的结果。MRSteiger方向性测试证实了因果方向。
    我们的研究结果强调了社会经济因素在影响虚弱风险方面的重要性。未来的研究应该集中在解开这些社会经济因素对脆弱的影响的途径上,最终目标是制定有针对性的策略来减轻脆弱的风险。
    UNASSIGNED: The relationship between socioeconomic status and frailty has been extensively investigated in the literature, but it remains unclear whether a causal relationship exists. Our goal is to evaluate the causal relationship between six socioeconomic traits and the frailty index using summary-level data for single nucleotide polymorphisms from large genome-wide association studies with individuals of European ancestry.
    UNASSIGNED: A two-sample MR was performed. We applied the inverse variance weighted (IVW) method for the primary estimate, with sensitivity analyses conducted using alternative MR methods to evaluate the robustness of the findings. A subsequent multivariable MR was undertaken to adjust for the effects of body mass index (BMI). Finally, the MR Steiger directionality test was performed to confirm the causal direction.
    UNASSIGNED: The IVW MR analysis revealed significant associations between various socioeconomic factors and the frailty index. Specifically, genetically predicated age completed full time education (β = -0.477, 95% confidence interval [CI]: -0.634 to -0.319) and average total household income before tax (β = -0.321, 95% CI: -0.410 to -0.232) were negatively associated with the frailty index. On the other hand, genetically predicted job involves heavy manual or physical work (β = 0.298, 95% CI: 0.113 to 0.484), job involves mainly walking or standing (β = 0.179, 95% CI: 0.013 to 0.345), Townsend deprivation index at recruitment (β = 0.535, 95% CI: 0.285 to 0.785), and social isolation/loneliness (β = 1.344, 95% CI: 0.834 to 1.853) were positively associated with the frailty index. Sensitivity analysis using other MR methods and multivariable MR analysis adjusting for BMI yielded stable results. The MR Steiger directionality test confirmed the causal direction.
    UNASSIGNED: Our findings highlight the importance of socioeconomic factors in affecting frailty risk. Future research should focus on unraveling the pathways through which these socioeconomic factors exert their effects on frailty, with the ultimate goal of developing targeted strategies to mitigate the risk of frailty.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    集成编码允许观察者形成平均值来表示一组元素。然而,目前尚不清楚观察者是否可以从跨类别集合中提取平均值。以前使用低水平刺激对此问题的调查产生了矛盾的结果。当前的研究通过提供高水平的刺激(即,一群面部表情)同时(实验1)或依次(实验2),并要求参与者完成成员判断任务。结果表明,参与者可以从一组具有较短感知距离的跨类别面部表情中提取平均信息。这些发现证明了高级刺激的跨类别集成编码,有助于理解集成编码,并为未来的研究提供灵感。
    Ensemble coding allows observers to form an average to represent a set of elements. However, it is unclear whether observers can extract an average from a cross-category set. Previous investigations on this issue using low-level stimuli yielded contradictory results. The current study addressed this issue by presenting high-level stimuli (i.e., a crowd of facial expressions) simultaneously (Experiment 1) or sequentially (Experiment 2), and asked participants to complete a member judgment task. The results showed that participants could extract average information from a group of cross-category facial expressions with a short perceptual distance. These findings demonstrate cross-category ensemble coding of high-level stimuli, contributing to the understanding of ensemble coding and providing inspiration for future research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    LD评分回归(LDSC)是一种仅从全基因组关联研究(GWAS)汇总统计来估计狭义遗传力的方法,使它成为一种快速和流行的方法。在这项工作中,我们提出了相互作用-LD评分(i-LDSC)回归:解释遗传变异之间相互作用的原始LDSC框架的扩展.通过研究模拟中的各种生成模型,并通过重新分析来自英国生物银行的349,468个人和日本生物银行中多达159,095个人的25种经过充分研究的定量表型,我们表明,纳入顺式相互作用评分(即局灶性变异体和近端变异体之间的相互作用)可以恢复LDSC未捕获的遗传变异.对于在英国生物银行和日本生物银行分析的25个性状中的每一个,i-LDSC检测由遗传相互作用贡献的额外变异。i-LDSC软件及其在这些生物库中的应用代表了解决非加性遗传效应来源对复杂性状变异的进一步遗传贡献的一步。
    LD score regression (LDSC) is a method to estimate narrow-sense heritability from genome-wide association study (GWAS) summary statistics alone, making it a fast and popular approach. In this work, we present interaction-LD score (i-LDSC) regression: an extension of the original LDSC framework that accounts for interactions between genetic variants. By studying a wide range of generative models in simulations, and by re-analyzing 25 well-studied quantitative phenotypes from 349,468 individuals in the UK Biobank and up to 159,095 individuals in BioBank Japan, we show that the inclusion of a cis-interaction score (i.e. interactions between a focal variant and proximal variants) recovers genetic variance that is not captured by LDSC. For each of the 25 traits analyzed in the UK Biobank and BioBank Japan, i-LDSC detects additional variation contributed by genetic interactions. The i-LDSC software and its application to these biobanks represent a step towards resolving further genetic contributions of sources of non-additive genetic effects to complex trait variation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    工具变量(IV)分析已广泛应用于流行病学中,以使用观察数据推断因果关系。在孟德尔随机化和全转录组关联研究中,遗传变异也可以被视为有效的IVs。然而,大多数多变量IV方法无法扩展到高通量实验数据。这里,我们利用我们以前工作的灵活性,一个联合分析边际汇总统计数据(hJAM)的分层模型,一个可扩展的框架(SHA-JAM),可以应用于大量的中间体和大量的相关遗传变异体-在利用组学技术的现代实验中经常遇到的情况。SHA-JAM旨在通过将来自单核苷酸多态性(SNP)中间或SNP基因表达的关联分析的估计值作为分层模型中的先验信息来估计高维危险因素对结果的条件影响。广泛的模拟研究结果表明,SHA-JAM在接收器工作特性曲线(AUC)下产生更高的面积,估计的均方误差较低,和更快的计算速度,与现有的类似分析方法相比。在两个前列腺癌的应用例子中,我们调查了代谢物和转录组的关联,分别,使用来自GWAS的汇总统计数据,对超过140,000名男性的前列腺癌以及可公开获得的代谢物和转录组的高维汇总数据。
    Instrumental variable (IV) analysis has been widely applied in epidemiology to infer causal relationships using observational data. Genetic variants can also be viewed as valid IVs in Mendelian randomization and transcriptome-wide association studies. However, most multivariate IV approaches cannot scale to high-throughput experimental data. Here, we leverage the flexibility of our previous work, a hierarchical model that jointly analyzes marginal summary statistics (hJAM), to a scalable framework (SHA-JAM) that can be applied to a large number of intermediates and a large number of correlated genetic variants-situations often encountered in modern experiments leveraging omic technologies. SHA-JAM aims to estimate the conditional effect for high-dimensional risk factors on an outcome by incorporating estimates from association analyses of single-nucleotide polymorphism (SNP)-intermediate or SNP-gene expression as prior information in a hierarchical model. Results from extensive simulation studies demonstrate that SHA-JAM yields a higher area under the receiver operating characteristics curve (AUC), a lower mean-squared error of the estimates, and a much faster computation speed, compared to an existing approach for similar analyses. In two applied examples for prostate cancer, we investigated metabolite and transcriptome associations, respectively, using summary statistics from a GWAS for prostate cancer with more than 140,000 men and high dimensional publicly available summary data for metabolites and transcriptomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    研究文章中的图表可以增加对统计数据的理解,但如果设计不当,可能会误导读者。我们提出了一种新的情节类型,海堆栈图,它结合了垂直直方图和汇总统计信息来准确地表示大型单变量数据集,有用的,并且高效。我们比较了五种常用的情节类型(点状图和胡须图,箱线图,密度图,单变量散点图,和点图),以评估它们在表示生物学研究中通常观察到的数据分布时的相对优势和劣势。我们发现评估的地块类型要么难以在大样本量下阅读,要么有可能歪曲某些数据分布,表明需要一种改进的数据可视化方法。我们对涵盖这些研究领域多个领域的四种生态和保护期刊中使用的地块类型进行了分析,发现广泛使用无信息的条形图以及点状和晶须图(所有面板中有60%显示来自多个组的单变量数据,以进行比较)。一些文章通过结合情节类型(占面板的16%),提供了更多信息数据,通常是箱线图和第二层,如平面密度图,以更好地显示数据。这表明人们对保护和生态学中更有效的地块类型有兴趣,如果提供准确和用户友好的情节类型,这可能会进一步增加。最后,我们描述了海图,并解释了它们如何克服与其他替代无信息地块相关的弱点,当用于大型和/或不均匀分布的数据时。我们提供了一个工具来使用我们的R包“seastackplot”创建海堆栈图,通过GitHub提供。
    Graphs in research articles can increase the comprehension of statistical data but may mislead readers if poorly designed. We propose a new plot type, the sea stack plot, which combines vertical histograms and summary statistics to represent large univariate datasets accurately, usefully, and efficiently. We compare five commonly used plot types (dot and whisker plots, boxplots, density plots, univariate scatter plots, and dot plots) to assess their relative strengths and weaknesses when representing distributions of data commonly observed in biological studies. We find the assessed plot types are either difficult to read at large sample sizes or have the potential to misrepresent certain distributions of data, showing the need for an improved method of data visualisation. We present an analysis of the plot types used in four ecology and conservation journals covering multiple areas of these research fields, finding widespread use of uninformative bar charts and dot and whisker plots (60% of all panels showing univariate data from multiple groups for the purpose of comparison). Some articles presented more informative figures by combining plot types (16% of panels), generally boxplots and a second layer such as a flat density plot, to better display the data. This shows an appetite for more effective plot types within conservation and ecology, which may further increase if accurate and user-friendly plot types were made available. Finally, we describe sea stack plots and explain how they overcome the weaknesses associated with other alternatives to uninformative plots when used for large and/or unevenly distributed data. We provide a tool to create sea stack plots with our R package \'seastackplot\', available through GitHub.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在进化过程的大型模型中,参数的统计估计通常在计算上效率低下,无法使用精确的模型似然性来追求。即使使用单核苷酸多态性(SNP)数据,这提供了一种在保留相关信息的同时减少遗传数据大小的方法。执行关于大型模型的参数的统计推断的近似贝叶斯计算(ABC)利用模拟来绕过模型可能性的直接评估。我们开发了一个机械模型来模拟具有可变迁移率的时间前向发散选择,繁殖方式(性,无性),迁移选择周期的长度和数量。我们研究了ABC进行统计推断的计算可行性,并研究了选择中基因座位置的估计质量和选择强度。要展开选择下的位置的参数空间,我们通过对汇总的观测数据实施离群扫描来增强模型。我们评估了众所周知的汇总统计数据对捕捉选择强度的有用性,并在不同的选择下评估它们的信息量。我们还评估了遗传漂移相对于单基因座选择的理想化确定性模型的影响。我们讨论了重组率作为估计发散选择强度的混杂因素的作用,并强调其在打破连锁不平衡(LD)中的重要性。我们回答的问题是,在模型的参数空间的哪一部分中,我们恢复了用于估计选择的强信号,并确定基于人口差异的汇总统计或基于LD的汇总统计在估计选择方面是否表现良好。
    Statistical estimation of parameters in large models of evolutionary processes is often too computationally inefficient to pursue using exact model likelihoods, even with single-nucleotide polymorphism (SNP) data, which offers a way to reduce the size of genetic data while retaining relevant information. Approximate Bayesian Computation (ABC) to perform statistical inference about parameters of large models takes the advantage of simulations to bypass direct evaluation of model likelihoods. We develop a mechanistic model to simulate forward-in-time divergent selection with variable migration rates, modes of reproduction (sexual, asexual), length and number of migration-selection cycles. We investigate the computational feasibility of ABC to perform statistical inference and study the quality of estimates on the position of loci under selection and the strength of selection. To expand the parameter space of positions under selection, we enhance the model by implementing an outlier scan on summarized observed data. We evaluate the usefulness of summary statistics well-known to capture the strength of selection, and assess their informativeness under divergent selection. We also evaluate the effect of genetic drift with respect to an idealized deterministic model with single-locus selection. We discuss the role of the recombination rate as a confounding factor in estimating the strength of divergent selection, and emphasize its importance in break down of linkage disequilibrium (LD). We answer the question for which part of the parameter space of the model we recover strong signal for estimating the selection, and determine whether population differentiation-based summary statistics or LD-based summary statistics perform well in estimating selection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    全基因组关联研究(GWAS)的最新进展不仅来自越来越大的样本量,而且还来自重点向代表性不足的人群转移。多种群GWAS通过利用来自不同种群的连锁不平衡(LD)的证据和差异来提高检测新风险变异的能力,并提高精细映射分辨率。这里,我们通过边际SNP效应联合分析(JAM)将以前的单种群精细映射方法扩展到多种群分析(mJAM)。假设真正的因果变异在研究中很常见,我们实现了一个层次模型框架,该框架对多个SNP进行了条件化,同时明确地将不同的LD结构纳入种群中.mJAM框架可用于首先使用具有不同特征选择方法的mJAM似然性来选择索引变体。此外,我们提出了一种新颖的方法,利用中介的思想为这些索引变体构建可信的集合。在给定任何现有索引变体的情况下,可以执行这样的可信集合的构造。我们通过两个实现来说明mJAM似然的实现:mJAM-SuSiE(贝叶斯方法)和mJAM-Forward选择。通过基于LD的实际效果大小和水平的仿真研究,我们证明了mJAM在构建包含潜在因果变量的简洁可信集合方面表现良好。在最新的多人群前列腺癌GWAS的真实数据例子中,我们展示了mJAM优于其他现有多种群方法的几个实际优势。
    Recent advancement in genome-wide association studies (GWAS) comes from not only increasingly larger sample sizes but also the shift in focus towards underrepresented populations. Multipopulation GWAS increase power to detect novel risk variants and improve fine-mapping resolution by leveraging evidence and differences in linkage disequilibrium (LD) from diverse populations. Here, we expand upon our previous approach for single-population fine-mapping through Joint Analysis of Marginal SNP Effects (JAM) to a multipopulation analysis (mJAM). Under the assumption that true causal variants are common across studies, we implement a hierarchical model framework that conditions on multiple SNPs while explicitly incorporating the different LD structures across populations. The mJAM framework can be used to first select index variants using the mJAM likelihood with different feature selection approaches. In addition, we present a novel approach leveraging the ideas of mediation to construct credible sets for these index variants. Construction of such credible sets can be performed given any existing index variants. We illustrate the implementation of the mJAM likelihood through two implementations: mJAM-SuSiE (a Bayesian approach) and mJAM-Forward selection. Through simulation studies based on realistic effect sizes and levels of LD, we demonstrated that mJAM performs well for constructing concise credible sets that include the underlying causal variants. In real data examples taken from the most recent multipopulation prostate cancer GWAS, we showed several practical advantages of mJAM over other existing multipopulation methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • DOI:
    文章类型: Preprint
    确定哪些变量确实会影响响应,同时控制假阳性遍及统计和数据科学。在本文中,我们考虑一种情况,在这种情况下,我们只能访问摘要统计信息,例如潜在兴趣的每个因变量与响应之间的边际经验相关性值。这种情况可能是由于隐私问题而出现的,例如,以避免敏感遗传信息的释放。我们扩展了GhostKnockoffs(He等人。[2022]),并引入基于惩罚回归的变量选择方法,实现错误发现率(FDR)控制。我们在广泛的模拟研究中报告了实证结果,展示了比以前的工作增强的性能。我们还将我们的方法应用于阿尔茨海默病的全基因组关联研究,并证明了力量的显着改善。
    Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to avoid the release of sensitive genetic information. We extend GhostKnockoffs He et al. [2022] and introduce variable selection methods based on penalized regression achieving false discovery rate (FDR) control. We report empirical results in extensive simulation studies, demonstrating enhanced performance over previous work. We also apply our methods to genome-wide association studies of Alzheimer\'s disease, and evidence a significant improvement in power.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:人脑可以通过集合汇总统计来快速表示相似刺激的集合,比如平均方向或大小。经典模型假定集合统计是通过对具有相等权重的所有元素进行积分来计算的。挑战这个观点,在这里,我们表明,集合统计是通过将副凹和中央凹统计与它们的可靠性成比例地组合来估计的。在一系列的实验中,观察者在不同的视觉不确定性水平下再现了一系列刺激的平均方向。
    结果:集合统计受到多重空间偏差的影响,特别是,对视野中心的强烈而持久的偏见。这种偏见,在大多数受试者和所有实验中都很明显,与不确定性成比例:集合统计中的不确定性越高,朝向中央凹显示的元件的偏置越大。
    结论:我们的研究结果表明,整体感知不能用简单的统一汇集来解释。视觉系统对来自副凹和中央凹的信息进行各向异性加权,考虑到视觉的内在空间各向异性来补偿视觉不确定性。
    BACKGROUND: The human brain can rapidly represent sets of similar stimuli by their ensemble summary statistics, like the average orientation or size. Classic models assume that ensemble statistics are computed by integrating all elements with equal weight. Challenging this view, here, we show that ensemble statistics are estimated by combining parafoveal and foveal statistics in proportion to their reliability. In a series of experiments, observers reproduced the average orientation of an ensemble of stimuli under varying levels of visual uncertainty.
    RESULTS: Ensemble statistics were affected by multiple spatial biases, in particular, a strong and persistent bias towards the center of the visual field. This bias, evident in the majority of subjects and in all experiments, scaled with uncertainty: the higher the uncertainty in the ensemble statistics, the larger the bias towards the element shown at the fovea.
    CONCLUSIONS: Our findings indicate that ensemble perception cannot be explained by simple uniform pooling. The visual system weights information anisotropically from both the parafovea and the fovea, taking the intrinsic spatial anisotropies of vision into account to compensate for visual uncertainty.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    这项研究的重点是使用基于汇总数据的孟德尔随机化(SMR)方法对暴露对结果的因果影响进行区间估计,同时考虑了由选择单核苷酸多态性工具引起的赢家诅咒。这个问题研究不足,因为点估计是有偏差的,所以很重要。由于Fieller定理及其变体不适合构建置信区间,我们使用盒子方法。已知这种盒方法是保守的,因此提供了覆盖水平的下限。为了评估箱式方法的性能,我们使用模拟研究,并将其与我们之前提出的支持区间和从SMR方法导出的Wald区间进行比较。所有这三种方法都适用于阿尔茨海默病的致病基因研究。总的来说,盒子法提出了一种替代方法,用于构建因果效应的区间估计,同时解决赢家的诅咒问题。
    This research focuses on the interval estimation of the causal effect of an exposure on an outcome using the summary data-based Mendelian randomization (SMR) method while accounting for the winner\'s curse caused by the selection of single nucleotide polymorphism instruments. This issue is understudied and is important as the point estimate is biased. Since Fieller\'s theorem and its variations are not suitable for constructing a confidence interval, we use the box method. This box method is known to be conservative and thus provides a lower bound on the coverage level. To assess the performance of the box method, we use simulation studies and compare it with the support interval we proposed earlier and the Wald interval derived from the SMR method. All three methods are applied to a study of causal genes for Alzheimer\'s disease. Overall, the box method presents an alternative for constructing interval estimates for a causal effect while addressing the winner\'s curse issue.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号