R package

R 包
  • 文章类型: Journal Article
    在经典的半定量代谢组学中,代谢物强度受生物因素和其他不需要的变化的影响。对数据处理方法进行系统评估对于确定给定实验装置的适当处理程序至关重要。当前的比较研究主要集中在峰面积数据上,而不是绝对浓度上。在这项研究中,我们评估了数据处理方法,以产生与相应的绝对量化数据最相似的输出.我们检查了数据分布特征,两种代谢物之间的倍数差异模式,和样本方差。我们使用来自零售牛奶研究和狼疮性肾炎队列的2个代谢组学数据集作为测试案例。在研究数据规范化的影响时,改造,缩放,以及这些方法的组合,我们发现交叉贡献补偿多标准归一化(ccmn)方法,后跟平方根数据转换,最适合于良好控制的研究,如牛奶研究数据集。关于狼疮性肾炎队列研究,只有ccmn归一化可以稍微改善有噪声队列的数据质量。由于评估考虑了处理数据与相应的绝对量化数据之间的相似性,我们的结果为在相似背景下处理代谢组学数据集(食物和临床代谢组学)提供了有用的指南.最后,我们引入了Metabox2.0,它可以对代谢组学数据进行彻底的分析,包括数据处理,生物标志物分析,综合分析,和数据解释。它被成功地用于处理和分析本研究中的数据。在线网络版本可在http://metsysbio.com/metabox获得。
    In classic semiquantitative metabolomics, metabolite intensities are affected by biological factors and other unwanted variations. A systematic evaluation of the data processing methods is crucial to identify adequate processing procedures for a given experimental setup. Current comparative studies are mostly focused on peak area data but not on absolute concentrations. In this study, we evaluated data processing methods to produce outputs that were most similar to the corresponding absolute quantified data. We examined the data distribution characteristics, fold difference patterns between 2 metabolites, and sample variance. We used 2 metabolomic datasets from a retail milk study and a lupus nephritis cohort as test cases. When studying the impact of data normalization, transformation, scaling, and combinations of these methods, we found that the cross-contribution compensating multiple standard normalization (ccmn) method, followed by square root data transformation, was most appropriate for a well-controlled study such as the milk study dataset. Regarding the lupus nephritis cohort study, only ccmn normalization could slightly improve the data quality of the noisy cohort. Since the assessment accounted for the resemblance between processed data and the corresponding absolute quantified data, our results denote a helpful guideline for processing metabolomic datasets within a similar context (food and clinical metabolomics). Finally, we introduce Metabox 2.0, which enables thorough analysis of metabolomic data, including data processing, biomarker analysis, integrative analysis, and data interpretation. It was successfully used to process and analyze the data in this study. An online web version is available at http://metsysbio.com/metabox.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在这里,我们详细介绍了R包的使用,\'EcoCountHelper\',以及相关的分析管道,旨在使基于广义线性混合效应模型(GLMM)的生态计数数据分析更易于访问。我们建议使用基于GLMM的分析工作流程,允许用户(1)选择分布形式(泊松与负二项式)和零膨胀(ZIP和ZINB,分别)使用AIC和方差-均值图,(2)使用模拟残差诊断检查拟合优度模型,(3)通过易于理解预测响应变化的输出来解释模型结果,(4)通过效应图比较预测变量效应的大小。我们的软件包使用了一系列易于使用的功能,可以同时接受宽形式和长形式的多分类单元计数数据,而无需编程经验。为了证明这种方法的实用性,我们使用我们的软件包来模拟与保护区(大提顿国家公园)中多个景观特征相关的声学蝙蝠活动数据,它受到侵袭疾病-白鼻子综合症的威胁。对蝙蝠保护的全球威胁,如疾病和森林砍伐,促使人们进行广泛的研究,以更好地了解蝙蝠生态学。尽管做出了这些努力,在对蝙蝠种群的持久性至关重要的土地上运营的管理人员通常掌握的有关当地蝙蝠活动的信息太少,无法做出明智的土地管理决策。在我们在Tetons的案例研究中,我们发现,多孔建筑物的患病率增加会增加鼠尾草和Myotisvolans的活动水平;随着与水的距离增加,lucifugus的活动减少;Myotisvolans的活动随着森林面积的增加而增加。通过将GLMM与“EcoCountHelper”结合使用,没有高级程序或统计专业知识的管理人员可以在统计上可靠的框架中评估景观特征对野生动植物的影响。
    Here we detail the use of an R package, \'EcoCountHelper\', and an associated analytical pipeline aimed at making generalized linear mixed-effects model (GLMM)-based analysis of ecological count data more accessible. We recommend a GLMM-based analysis workflow that allows the user to (1) employ selection of distributional forms (Poisson vs negative binomial) and zero-inflation (ZIP and ZINB, respectively) using AIC and variance-mean plots, (2) examine models for goodness-of-fit using simulated residual diagnostics, (3) interpret model results via easy to understand outputs of changes in predicted responses, and (4) compare the magnitude of predictor variable effects via effects plots. Our package uses a series of easy-to-use functions that can accept both wide- and long-form multi-taxa count data without the need for programming experience. To demonstrate the utility of this approach, we use our package to model acoustic bat activity data relative to multiple landscape characteristics in a protected area (Grand Teton National Park), which is threatened by encroaching disease-white nose syndrome. Global threats to bat conservation such as disease and deforestation have prompted extensive research to better understand bat ecology. Notwithstanding these efforts, managers operating on lands crucial to the persistence of bat populations are often equipped with too little information regarding local bat activity to make informed land-management decisions. In our case study in the Tetons, we found that an increased prevalence of porous buildings increases activity levels of Eptesicus fuscus and Myotis volans; Myotis lucifugus activity decreases as distance to water increases; and Myotis volans activity increases with the amount of forested area. By using GLMMs in tandem with \'EcoCountHelper\', managers without advanced programmatic or statistical expertise can assess the effects of landscape characteristics on wildlife in a statistically-robust framework.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Phase II trials that evaluate target therapies based on a biomarker must be well designed in order to assess anti-tumor activity as well as clinical utility of the biomarker. Classical phase II designs do not deal with this molecular heterogeneity and can lead to an erroneous conclusion in the whole population, whereas a subgroup of patients may well benefit from the new therapy. Moreover, the target population to be evaluated in a phase III trial may be incorrectly specified. Alternative approaches are proposed in the literature that make it possible to include two subgroups according to biomarker status (negative/positive) in the same study. Jones, Parashar and Tournoux et al. propose different stratified adaptive two-stage designs to identify a subgroup of interest in a heterogeneous population that could possibly benefit from the experimental treatment at the end of the first or second stage. Nevertheless, these designs are rarely used in oncology research. After introducing these stratified adaptive designs, we present an R package (ph2hetero) implementing these methods. A case study is provided to illustrate both the designs and the use of the R package. These stratified adaptive designs provide a useful alternative to classical two-stage designs and may also provide options in contexts other than biomarker studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    The attributable fraction (or attributable risk) is a widely used measure that quantifies the public health impact of an exposure on an outcome. Even though the theory for AF estimation is well developed, there has been a lack of up-to-date software implementations. The aim of this article is to present a new R package for AF estimation with binary exposures. The package AF allows for confounder-adjusted estimation of the AF for the three major study designs: cross-sectional, (possibly matched) case-control and cohort. The article is divided into theoretical sections and applied sections. In the theoretical sections we describe how the confounder-adjusted AF is estimated for each specific study design. These sections serve as a brief but self-consistent tutorial in AF estimation. In the applied sections we use real data examples to illustrate how the AF package is used. All datasets in these examples are publicly available and included in the AF package, so readers can easily replicate all analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    桥接免疫基因组数据分析工作流间隙(BIGDAWG)是一个集成的数据分析管道,旨在对高度多态性的遗传数据进行标准化分析,特别适用于HLA和KIR遗传系统。大多数现代遗传分析程序都是为分析单核苷酸多态性而设计的,但是HLA和KIR数据的高度多态性需要专门的数据分析方法。BIGDAWG对HLA和KIR基因座的高度多态性基因型数据进行病例对照数据分析。BIGDAWG对Hardy-Weinberg平衡进行测试,计算k×2和2×2卡方检验的等位基因频率和低频等位基因,并计算赔率比,每个等位基因的置信区间和p值。当多基因座基因型数据可用时,BIGDAWG估计用户指定的单倍型,并对每个单倍型执行相同的分级和统计计算。对于HLA基因座,BIGDAWG在单个氨基酸水平上进行相同的分析。最后,BIGDAWG为每个比较生成数字和表格。BIGDAWG消除了在多个程序之间传输数据所需的易错重新格式化,并简化和标准化了高度多态数据的病例对照研究的数据分析过程。BIGDAWG已被实现为bigdawgR包和bigdawg的免费Web应用程序。免疫遗传学.org.
    Bridging ImmunoGenomic Data-Analysis Workflow Gaps (BIGDAWG) is an integrated data-analysis pipeline designed for the standardized analysis of highly-polymorphic genetic data, specifically for the HLA and KIR genetic systems. Most modern genetic analysis programs are designed for the analysis of single nucleotide polymorphisms, but the highly polymorphic nature of HLA and KIR data require specialized methods of data analysis. BIGDAWG performs case-control data analyses of highly polymorphic genotype data characteristic of the HLA and KIR loci. BIGDAWG performs tests for Hardy-Weinberg equilibrium, calculates allele frequencies and bins low-frequency alleles for k×2 and 2×2 chi-squared tests, and calculates odds ratios, confidence intervals and p-values for each allele. When multi-locus genotype data are available, BIGDAWG estimates user-specified haplotypes and performs the same binning and statistical calculations for each haplotype. For the HLA loci, BIGDAWG performs the same analyses at the individual amino-acid level. Finally, BIGDAWG generates figures and tables for each of these comparisons. BIGDAWG obviates the error-prone reformatting needed to traffic data between multiple programs, and streamlines and standardizes the data-analysis process for case-control studies of highly polymorphic data. BIGDAWG has been implemented as the bigdawg R package and as a free web application at bigdawg.immunogenomics.org.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号