R package

R 包
  • 文章类型: Journal Article
    免疫荧光染色通常用于生成图像以表征细胞学表型。使用图像数据对减数分裂过程中的DNA双链断裂及其修复中间体进行手动定量需要一系列主观步骤,从图像选择到每个细胞核特定事件的计数。在这里我们描述“突触,“一个生物导体包,其中包括一组功能,以自动识别减数分裂细胞核和定量关键的双链断裂形成和修复事件的过程,可扩展,和可重复的工作流程,并将其与手动用户量化进行比较。该软件可以扩展到减数分裂研究中的其他应用,例如结合机器学习方法对减数分裂子进行分类。
    Immunofluorescent staining is commonly used to generate images to characterize cytological phenotypes. The manual quantification of DNA double-strand breaks and their repair intermediates during meiosis using image data requires a series of subjective steps, from image selection to the counting of particular events per nucleus. Here we describe \"synapsis,\" a bioconductor package, which includes a set of functions to automate the process of identifying meiotic nuclei and quantifying key double-strand break formation and repair events in a rapid, scalable, and reproducible workflow, and compare it to manual user quantification. The software can be extended for other applications in meiosis research, such as incorporating machine learning approaches to categorize meiotic substages.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:尽管大多数网络荟萃分析(NMA)使用来自随机临床试验(RCT)的汇总数据(AD),其他研究设计(例如,队列研究和其他非随机研究,NRS)可以提供有关相对治疗效果的信息。研究的个体参与者数据(IPD),当可用时,对于调整重要的参与者特征以及更好地处理网络中的异质性和不一致性,都优于AD。
    结果:我们开发了R包crossnma,以执行交叉格式(IPD和AD)和交叉设计(RCT和NRS)NMA和网络元回归(NMR)。在R环境中使用另一个吉布斯采样器(JAGS)软件将模型实现为贝叶斯三级分层模型。R包crossnma包含自动创建JAGS模型的函数,重新格式化数据(基于用户输入),评估收敛性并总结结果。我们通过使用六个比较四个治疗方法的试验网络来证明Crosnma内的工作流程。
    结论:R包crossnma使用户能够在贝叶斯框架中使用不同数据类型执行NMA和NMR,并有助于纳入所有类型的证据,以识别偏差风险的差异。
    BACKGROUND: Although aggregate data (AD) from randomised clinical trials (RCTs) are used in the majority of network meta-analyses (NMAs), other study designs (e.g., cohort studies and other non-randomised studies, NRS) can be informative about relative treatment effects. The individual participant data (IPD) of the study, when available, are preferred to AD for adjusting for important participant characteristics and to better handle heterogeneity and inconsistency in the network.
    RESULTS: We developed the R package crossnma to perform cross-format (IPD and AD) and cross-design (RCT and NRS) NMA and network meta-regression (NMR). The models are implemented as Bayesian three-level hierarchical models using Just Another Gibbs Sampler (JAGS) software within the R environment. The R package crossnma includes functions to automatically create the JAGS model, reformat the data (based on user input), assess convergence and summarize the results. We demonstrate the workflow within crossnma by using a network of six trials comparing four treatments.
    CONCLUSIONS: The R package crossnma enables the user to perform NMA and NMR with different data types in a Bayesian framework and facilitates the inclusion of all types of evidence recognising differences in risk of bias.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    估计样本量和统计能力是良好的流行病学研究设计的重要组成部分。封闭形式的公式适用于简单的假设检验,但不适用于为暴露混合物研究设计的高级统计方法。用蒙特卡罗模拟估计功率是灵活的,适用于这些方法。然而,为没有经验的程序员编写模拟代码并不简单,研究人员通常很难手动指定暴露混合物之间的多变量关联来设置模拟。为了简化这个过程,我们提出了R软件包mpower,用于对涉及最近开发的混合物分析方法的环境暴露混合物的观测研究进行功率分析。mpower中的组件也足够通用,可以容纳将来开发的任何混合方法。该软件包允许用户根据公共数据集(如国家健康和营养检查调查或先前研究的其他现有数据集)模拟真实的暴露数据和混合类型的协变量。用户可以生成功率曲线来评估样本量之间的权衡,效果大小,和设计的力量。本文介绍了使用mpower进行功率分析的教程和示例。
    Estimating sample size and statistical power is an essential part of a good epidemiological study design. Closed-form formulas exist for simple hypothesis tests but not for advanced statistical methods designed for exposure mixture studies. Estimating power with Monte Carlo simulations is flexible and applicable to these methods. However, it is not straightforward to code a simulation for non-experienced programmers and is often hard for a researcher to manually specify multivariate associations among exposure mixtures to set up a simulation. To simplify this process, we present the R package mpower for power analysis of observational studies of environmental exposure mixtures involving recently-developed mixtures analysis methods. The components within mpower are also versatile enough to accommodate any mixtures methods that will developed in the future. The package allows users to simulate realistic exposure data and mixed-typed covariates based on public data set such as the National Health and Nutrition Examination Survey or other existing data set from prior studies. Users can generate power curves to assess the trade-offs between sample size, effect size, and power of a design. This paper presents tutorials and examples of power analysis using mpower.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    高通量测序技术的广泛使用彻底改变了对生物学和癌症异质性的理解。最近,已经开发了几种基于转录数据的机器学习模型来准确预测患者的预后和临床反应。然而,一个开源的R包,涵盖了最先进的机器学习算法,用于用户友好的访问尚未开发。因此,我们提出了一个灵活的计算框架来构建一个基于机器学习的集成模型,具有优雅的性能(Mime)。Mime简化了高精度预测模型的开发过程,利用复杂的数据集来识别与预后相关的关键基因。与其他已发表的模型相比,由Mime构建的基于从头PIEZO1相关签名的计算机组合模型在预测患者结局方面具有很高的准确性。此外,通过在Mime中应用不同算法,PIEZO1相关特征也可以精确推断免疫治疗反应.最后,选自PIEZO1相关特征的SDC1表现出作为神经胶质瘤靶标的高潜力。一起来看,我们的软件包为构建基于机器学习的集成模型提供了用户友好的解决方案,并将大大扩展以提供对当前领域的宝贵见解。Mime软件包可在GitHub(https://github.com/l-magnumeration/Mime)上找到。
    The widespread use of high-throughput sequencing technologies has revolutionized the understanding of biology and cancer heterogeneity. Recently, several machine-learning models based on transcriptional data have been developed to accurately predict patients\' outcome and clinical response. However, an open-source R package covering state-of-the-art machine-learning algorithms for user-friendly access has yet to be developed. Thus, we proposed a flexible computational framework to construct a machine learning-based integration model with elegant performance (Mime). Mime streamlines the process of developing predictive models with high accuracy, leveraging complex datasets to identify critical genes associated with prognosis. An in silico combined model based on de novo PIEZO1-associated signatures constructed by Mime demonstrated high accuracy in predicting the outcomes of patients compared with other published models. Furthermore, the PIEZO1-associated signatures could also precisely infer immunotherapy response by applying different algorithms in Mime. Finally, SDC1 selected from the PIEZO1-associated signatures demonstrated high potential as a glioma target. Taken together, our package provides a user-friendly solution for constructing machine learning-based integration models and will be greatly expanded to provide valuable insights into current fields. The Mime package is available on GitHub (https://github.com/l-magnificence/Mime).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目标:尽管CohortBuilder等工具易于使用,使用我们所有的研究计划数据复杂的研究问题需要相对较高的技术专长水平。我们旨在提高研究和培训能力,并通过R包减少进入我们所有人社区的障碍,Allofus.在这篇文章中,我们描述了解决我们在使用我们所有研究计划数据时遇到的常见挑战的功能,我们通过综合电子健康记录和具有时间依赖性的调查数据来创建“我们所有参与者”队列的示例演示此功能。
    背景:健康研究人员可以广泛获得我们所有的研究计划数据。allofusR软件包是针对广泛的研究人员,他们希望使用最佳实践进行复杂的分析,以实现可重复性和透明度,并且具有使用R的一系列经验。因为我们所有的数据都被转换为观察性医疗结果伙伴关系通用数据模型(OMOPCDM),熟悉现有OMOPCDM工具或希望与其他OMOPCDM数据一起进行网络研究的研究人员也将在该软件包中找到价值。
    方法:我们开发了一套初始功能,可以解决我们在自己的研究和指导学生项目中遇到的调查和电子健康记录数据问题。该方案将继续增长和发展与我们所有的研究计划。allofusR软件包可以通过增加对“我们所有研究计划”数据的访问来帮助建立社区研究能力,它的使用效率,以及由此产生的研究的严谨性和可重复性。
    OBJECTIVE: Despite easy-to-use tools like the Cohort Builder, using All of Us Research Program data for complex research questions requires a relatively high level of technical expertise. We aimed to increase research and training capacity and reduce barriers to entry for the All of Us community through an R package, allofus. In this article, we describe functions that address common challenges we encountered while working with All of Us Research Program data, and we demonstrate this functionality with an example of creating a cohort of All of Us participants by synthesizing electronic health record and survey data with time dependencies.
    BACKGROUND: All of Us Research Program data are widely available to health researchers. The allofus R package is aimed at a wide range of researchers who wish to conduct complex analyses using best practices for reproducibility and transparency, and who have a range of experience using R. Because the All of Us data are transformed into the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), researchers familiar with existing OMOP CDM tools or who wish to conduct network studies in conjunction with other OMOP CDM data will also find value in the package.
    METHODS: We developed an initial set of functions that solve problems we experienced across survey and electronic health record data in our own research and in mentoring student projects. The package will continue to grow and develop with the All of Us Research Program. The allofus R package can help build community research capacity by increasing access to the All of Us Research Program data, the efficiency of its use, and the rigor and reproducibility of the resulting research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    光合性状的分析已成为植物(生态)生理学的组成部分。这些特征中的许多都不是直接测量的,但是根据几个组合计算,更直接,测量。这些导出变量的计算基于基础物理模型,并且可以使用附加常数或假定值。商业上可用的气体交换仪器通常报告这样的派生变量,但可用的实现使用不同的定义和假设。此外,当前没有软件可用于允许包括导入数据的完全脚本化和可重复的工作流,预处理和重新计算派生量。R包气体分析仪旨在通过提供从不同仪器导入数据的方法来解决这些问题,通过将光合变量转化为标准化命名法,并通过使用标准化方程可选地重新计算导出量。此外,该软件包有助于对计算中使用的变量或假设进行敏感性分析,以使研究人员能够更好地评估结果的稳健性。使用三个不同的示例演示了软件包的使用以及如何进行灵敏度分析。
    The analysis of photosynthetic traits has become an integral part of plant (eco-)physiology. Many of these characteristics are not directly measured, but calculated from combinations of several, more direct, measurements. The calculations of such derived variables are based on underlying physical models and may use additional constants or assumed values. Commercially available gas-exchange instruments typically report such derived variables, but the available implementations use different definitions and assumptions. Moreover, no software is currently available to allow a fully scripted and reproducible workflow that includes importing data, pre-processing and recalculating derived quantities. The R package gasanalyzer aims to address these issues by providing methods to import data from different instruments, by translating photosynthetic variables to a standardized nomenclature, and by optionally recalculating derived quantities using standardized equations. In addition, the package facilitates performing sensitivity analyses on variables or assumptions used in the calculations to allow researchers to better assess the robustness of the results. The use of the package and how to perform sensitivity analyses are demonstrated using three different examples.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    高通量转录组RNA测序是理解动态生物过程的强大工具。这里,我们提出了一个计算框架,在R包QDSWorkflow中实现,使用来自大量样品和单细胞的RNA测序数据来表征异质细胞休眠深度。
    High-throughput transcriptome RNA sequencing is a powerful tool for understanding dynamic biological processes. Here, we present a computational framework, implemented in an R package QDSWorkflow, to characterize heterogeneous cellular dormancy depth using RNA-sequencing data from bulk samples and single cells.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:近亲繁殖和关系系数对于保护和育种计划至关重要。无论是与少数保守人口或大量商业人口打交道,监测近亲繁殖率并设计交配计划以最大程度地降低近亲繁殖率并最大程度地提高有效种群规模是很重要的。免费,开源,和高效的软件可以大大有助于保护和育种计划,并帮助学生和研究人员。存在计算近亲繁殖系数的有效方法。因此,计算分子关系系数的有效方法是通过近交系数。即,亲本之间的关系系数是其后代近交系数的2倍。引入虚拟后代,其中一对个体不存在后代。计算近亲繁殖系数非常快,并且发现一对个体是否具有后代并从多个后代中挑选一个在计算上要求更高。因此,R包为任何一对关系系数感兴趣的个体引入了虚拟后代,不管他们有没有后代.
    结果:运行时间和峰值记忆使用是基准的,用于计算来自2,721,252只动物谱系的两组250和800只动物(200,000个虚拟后代)之间的关系系数。程序执行高效(200,000个关系系数,其中涉及在3:45(mm:ss)内计算2,721,252+200,000个近交系数)。提供近亲繁殖系数(对于真实的动物),运行时间减少到1:08。此外,在A=TDT\'(D)中提供D的对角元素,运行时间减少到54s。所有分析均在总内存大小为1GB的机器上进行。
    结论:R软件包FnR是免费的开源软件,对保护和育种计划具有重要意义。事实证明,对于大量种群和许多虚拟后代而言,它具有时间和记忆效率。可以针对谱系中的新动物恢复近交系数的计算。因此,建议保存最新的近亲繁殖系数估计。d系数的计算(从头开始)非常快,存储这些以备将来使用的价值有限。
    BACKGROUND: Inbreeding and relationship coefficients are essential for conservation and breeding programs. Whether dealing with a small conserved population or a large commercial population, monitoring the inbreeding rate and designing mating plans that minimize the inbreeding rate and maximize the effective population size is important. Free, open-source, and efficient software may greatly contribute to conservation and breeding programs and help students and researchers. Efficient methods exist for calculating inbreeding coefficients. Therefore, an efficient way of calculating the numerator relationship coefficients is via the inbreeding coefficients. i.e., the relationship coefficient between parents is twice the inbreeding coefficient of their progeny. A dummy progeny is introduced where no progeny exists for a pair of individuals. Calculating inbreeding coefficients is very fast, and finding whether a pair of individuals has a progeny and picking one from multiple progenies is computationally more demanding. Therefore, the R package introduces a dummy progeny for any pair of individuals whose relationship coefficient is of interest, whether they have a progeny or not.
    RESULTS: Runtime and peak memory usage were benchmarked for calculating relationship coefficients between two sets of 250 and 800 animals (200,000 dummy progenies) from a pedigree of 2,721,252 animals. The program performed efficiently (200,000 relationship coefficients, which involved calculating 2,721,252 + 200,000 inbreeding coefficients) within 3:45 (mm:ss). Providing the inbreeding coefficients (for real animals), the runtime was reduced to 1:08. Furthermore, providing the diagonal elements of D in A = TDT \' (d), the runtime was reduced to 54s. All the analyses were performed on a machine with a total memory size of 1 GB.
    CONCLUSIONS: The R package FnR is free and open-source software with implications in conservation and breeding programs. It proved to be time and memory efficient for large populations and many dummy progenies. Calculation of inbreeding coefficients can be resumed for new animals in the pedigree. Thus, saving the latest inbreeding coefficient estimates is recommended. Calculation of d coefficients (from scratch) was very fast, and there was limited value in storing those for future use.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:决策分析模型和荟萃分析通常依赖于从已发表的Kaplan-Meier(KM)曲线数字化的生存概率。然而,从KM曲线中手动提取这些概率是耗时的,贵,而且容易出错。我们开发了一种有效而准确的算法,可以自动从KM曲线中提取生存概率。
    方法:自动数字化算法处理来自JPG或PNG格式的图像,将它们转换为色调,饱和度,和亮度尺度,并使用光学字符识别来检测轴位置和标签。它还使用k-medoids聚类算法来分离同一图上的多条重叠曲线。要验证性能,我们从25,50,150和250,1000例个体的样本量随机事件发生时间数据中生成了生存图,分成1,2或3个治疗组.我们假设指数分布并应用随机审查。我们比较了由训练有素的研究人员进行的自动数字化和手动数字化。我们计算了两种方法在100个时间点的均方根误差(RMSE)。还通过Bland-Altman分析评估了该算法的性能,以在真实世界的已发布KM曲线集上实现自动数字化和手动数字化之间的一致性。
    结果:自动数字化仪在模拟的KM曲线中准确识别了随时间变化的生存概率。自动数字化的平均RMSE为0.012,而手动数字化的平均RMSE为0.014。其性能与图中曲线的数量和审查标记的存在呈负相关。在现实世界的场景中,自动数字化和手动数字化显示出非常密切的一致。
    结论:该算法简化了数字化过程,只需最少的用户输入。它有效地数字化了模拟和现实场景中的KM曲线,证明了与传统手动数字化相当的准确性。该算法已被开发为开源R包和Shiny应用程序,可在GitHub上获得:https://github.com/Pechli-Lab/SurvdigitizeR和https://pechlilab。shinyapps.io/SurvdigitizeR/.
    BACKGROUND: Decision analytic models and meta-analyses often rely on survival probabilities that are digitized from published Kaplan-Meier (KM) curves. However, manually extracting these probabilities from KM curves is time-consuming, expensive, and error-prone. We developed an efficient and accurate algorithm that automates extraction of survival probabilities from KM curves.
    METHODS: The automated digitization algorithm processes images from a JPG or PNG format, converts them in their hue, saturation, and lightness scale and uses optical character recognition to detect axis location and labels. It also uses a k-medoids clustering algorithm to separate multiple overlapping curves on the same figure. To validate performance, we generated survival plots form random time-to-event data from a sample size of 25, 50, 150, and 250, 1000 individuals split into 1,2, or 3 treatment arms. We assumed an exponential distribution and applied random censoring. We compared automated digitization and manual digitization performed by well-trained researchers. We calculated the root mean squared error (RMSE) at 100-time points for both methods. The algorithm\'s performance was also evaluated by Bland-Altman analysis for the agreement between automated and manual digitization on a real-world set of published KM curves.
    RESULTS: The automated digitizer accurately identified survival probabilities over time in the simulated KM curves. The average RMSE for automated digitization was 0.012, while manual digitization had an average RMSE of 0.014. Its performance was negatively correlated with the number of curves in a figure and the presence of censoring markers. In real-world scenarios, automated digitization and manual digitization showed very close agreement.
    CONCLUSIONS: The algorithm streamlines the digitization process and requires minimal user input. It effectively digitized KM curves in simulated and real-world scenarios, demonstrating accuracy comparable to conventional manual digitization. The algorithm has been developed as an open-source R package and as a Shiny application and is available on GitHub: https://github.com/Pechli-Lab/SurvdigitizeR and https://pechlilab.shinyapps.io/SurvdigitizeR/ .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    蛋白质组学提供了一种可靠的方法来定量蛋白质并阐明其在细胞功能中的作用。超越了转录组学提供的见解。临床蛋白质组学肿瘤分析联盟数据库,丰富了全面的癌症蛋白质组学数据,包括磷酸化和泛素化谱,与基因组数据共用区的转录组学数据一起,允许癌症的综合分子研究。蛋白质癌分析套件(PCAS),我们新开发的R包和Shinyapp,利用这些资源来促进蛋白质组学的深入分析,磷酸蛋白质组学,和转录组学,通过免疫浸润和药物敏感性分析等特征增强我们对肿瘤微环境的理解。该工具有助于识别关键的信号通路和治疗靶点,特别是通过其详细的磷酸化蛋白质组分析。为了演示PCAS的功能,我们对多种癌症类型的GAPDH进行了分析,揭示了蛋白质水平的显著上调,这与其在肿瘤中的重要生物学和临床意义是一致的,正如我们之前的研究表明的那样。进一步的实验用于验证使用该工具进行的发现。总之,PCAS是进行全面蛋白质组学分析的强大而有价值的工具,显着增强我们揭示致癌机制和确定癌症研究中潜在治疗靶点的能力。
    Proteomics offers a robust method for quantifying proteins and elucidating their roles in cellular functions, surpassing the insights provided by transcriptomics. The Clinical Proteomic Tumor Analysis Consortium database, enriched with comprehensive cancer proteomics data including phosphorylation and ubiquitination profiles, alongside transcriptomics data from the Genomic Data Commons, allow for integrative molecular studies of cancer. The ProteoCancer Analysis Suite (PCAS), our newly developed R package and Shinyapp, leverages these resources to facilitate in-depth analyses of proteomics, phosphoproteomics, and transcriptomics, enhancing our understanding of the tumor microenvironment through features like immune infiltration and drug sensitivity analysis. This tool aids in identifying critical signaling pathways and therapeutic targets, particularly through its detailed phosphoproteomic analysis. To demonstrate the functionality of the PCAS, we conducted an analysis of GAPDH across multiple cancer types, revealing a significant upregulation of protein levels, which is consistent with its important biological and clinical significance in tumors, as indicated in our prior research. Further experiments were used to validate the findings performed using the tool. In conclusion, the PCAS is a powerful and valuable tool for conducting comprehensive proteomic analyses, significantly enhancing our ability to uncover oncogenic mechanisms and identify potential therapeutic targets in cancer research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号