Bioconductor

生物导体
  • 文章类型: Journal Article
    免疫荧光染色通常用于生成图像以表征细胞学表型。使用图像数据对减数分裂过程中的DNA双链断裂及其修复中间体进行手动定量需要一系列主观步骤,从图像选择到每个细胞核特定事件的计数。在这里我们描述“突触,“一个生物导体包,其中包括一组功能,以自动识别减数分裂细胞核和定量关键的双链断裂形成和修复事件的过程,可扩展,和可重复的工作流程,并将其与手动用户量化进行比较。该软件可以扩展到减数分裂研究中的其他应用,例如结合机器学习方法对减数分裂子进行分类。
    Immunofluorescent staining is commonly used to generate images to characterize cytological phenotypes. The manual quantification of DNA double-strand breaks and their repair intermediates during meiosis using image data requires a series of subjective steps, from image selection to the counting of particular events per nucleus. Here we describe \"synapsis,\" a bioconductor package, which includes a set of functions to automate the process of identifying meiotic nuclei and quantifying key double-strand break formation and repair events in a rapid, scalable, and reproducible workflow, and compare it to manual user quantification. The software can be extended for other applications in meiosis research, such as incorporating machine learning approaches to categorize meiotic substages.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    随着大量和单细胞分析对多组数据的依赖增加,对聚类进行无监督分析的健壮方法的可用性,可视化,而特征选择势在必行。联合降维方法可以应用于多组学数据集,以得出类似于单组学技术的全局样本嵌入,例如主成分分析(PCA)。多重协同惯性分析(MCIA)是一种用于联合降维的方法,可最大化块级和全局级嵌入之间的协方差。MCIA的当前实现未针对大型数据集进行优化,例如来自单细胞研究的数据集。并且缺乏嵌入新数据的能力。
    我们介绍一下nipalsMCIA,一种MCIA实现,使用对非线性迭代偏最小二乘(NIPALS)的扩展来求解目标函数,与依赖单细胞多组学数据的特征分解的早期实现相比,显示出显着的加速。它还消除了对计算解释方差的特征分解的依赖,并允许用户对新数据执行样本外嵌入。nipalsMCIA为用户提供各种预处理和参数选项,以及简单的功能,用于单个整体和全局嵌入因子的下游分析。
    nipalsMCIA作为BioConductor软件包可在https://bioparductor.org/packages/release/bioc/html/nipalsMCIA获得。html,并包括详细的文档和应用插图。补充材料可在线获得。
    UNASSIGNED: With the increased reliance on multi-omics data for bulk and single cell analyses, the availability of robust approaches to perform unsupervised analysis for clustering, visualization, and feature selection is imperative. Joint dimensionality reduction methods can be applied to multi-omics datasets to derive a global sample embedding analogous to single-omic techniques such as Principal Components Analysis (PCA). Multiple co-inertia analysis (MCIA) is a method for joint dimensionality reduction that maximizes the covariance between block- and global-level embeddings. Current implementations for MCIA are not optimized for large datasets such such as those arising from single cell studies, and lack capabilities with respect to embedding new data.
    UNASSIGNED: We introduce nipalsMCIA, an MCIA implementation that solves the objective function using an extension to Non-linear Iterative Partial Least Squares (NIPALS), and shows significant speed-up over earlier implementations that rely on eigendecompositions for single cell multi-omics data. It also removes the dependence on an eigendecomposition for calculating the variance explained, and allows users to perform out-of-sample embedding for new data. nipalsMCIA provides users with a variety of pre-processing and parameter options, as well as ease of functionality for down-stream analysis of single-omic and global-embedding factors.
    UNASSIGNED: nipalsMCIA is available as a BioConductor package at https://bioconductor.org/packages/release/bioc/html/nipalsMCIA.html, and includes detailed documentation and application vignettes. Supplementary Materials are available online.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基于质谱(MS)的单细胞蛋白质组学(SCP)通过关注细胞蛋白质的功能效应子来探索细胞异质性。然而,从MS数据中提取有意义的生物信息绝非易事,尤其是单细胞。目前,数据分析工作流程从一个研究团队到另一个研究团队有很大的不同。此外,由于缺乏地面真相,很难评估管道。我们的团队开发了名为scp的R/Bioconductor软件包,为SCP数据分析提供了一个标准化的框架。它依赖于广泛使用的QFeatures和SingleCellExperiment数据结构。此外,我们使用包含以已知比例混合的细胞系的设计,以产生受控的变异性用于数据分析基准.在这一章中,我们使用scp软件包为SCP数据提供了灵活的数据分析协议,并在处理的每个步骤中提供了全面的解释.我们的主要步骤是功能和细胞水平的质量控制,将原始数据汇总为肽和蛋白质,归一化,和批量更正。我们使用我们的地面实况数据集验证我们的工作流程。我们说明如何使用这个模块化,标准化框架,并强调一些关键步骤。
    Mass-spectrometry (MS)-based single-cell proteomics (SCP) explores cellular heterogeneity by focusing on the functional effectors of the cells-proteins. However, extracting meaningful biological information from MS data is far from trivial, especially with single cells. Currently, data analysis workflows are substantially different from one research team to another. Moreover, it is difficult to evaluate pipelines as ground truths are missing. Our team has developed the R/Bioconductor package called scp to provide a standardized framework for SCP data analysis. It relies on the widely used QFeatures and SingleCellExperiment data structures. In addition, we used a design containing cell lines mixed in known proportions to generate controlled variability for data analysis benchmarking. In this chapter, we provide a flexible data analysis protocol for SCP data using the scp package together with comprehensive explanations at each step of the processing. Our main steps are quality control on the feature and cell level, aggregation of the raw data into peptides and proteins, normalization, and batch correction. We validate our workflow using our ground truth data set. We illustrate how to use this modular, standardized framework and highlight some crucial steps.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Heatmap是一种在类矩阵数据上广泛使用的统计可视化方法,用于揭示由行和列的子集共享的相似模式。在R编程语言中,有许多包,使热图。其中,ComplexHeatmap软件包为构建高度可定制的热图提供了最丰富的工具集。ComplexHeatmap可以通过自动连接和调整热图列表以及复杂的注释来轻松地在多源信息之间建立连接,这使得它广泛应用于许多领域的数据分析,特别是在生物信息学中,在数据中查找隐藏的结构。在这篇文章中,我们全面介绍了ComplexHeatmap的当前状态,包括其模块化设计,其丰富的功能,及其广泛的应用。
    Heatmap is a widely used statistical visualization method on matrix-like data to reveal similar patterns shared by subsets of rows and columns. In the R programming language, there are many packages that make heatmaps. Among them, the ComplexHeatmap package provides the richest toolset for constructing highly customizable heatmaps. ComplexHeatmap can easily establish connections between multisource information by automatically concatenating and adjusting a list of heatmaps as well as complex annotations, which makes it widely applied in data analysis in many fields, especially in bioinformatics, to find hidden structures in the data. In this article, we give a comprehensive introduction to the current state of ComplexHeatmap, including its modular design, its rich functionalities, and its broad applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    计算机科学的进步与下一代测序相结合,开创了生物学的新纪元。能够对复杂的生物数据进行先进的分析。生物信息学正在发展成为计算机科学和生物学之间的结合领域,启用表示,storage,管理,用大量的机器学习算法和计算工具分析和探索多种类型的数据。在这项研究中,我们使用机器学习算法检测不同类型癌症之间的差异表达基因,并显示RNA测序分析的最终结果存在重叠.数据集从国家生物技术信息中心资源获得。具体来说,对应于PMID的数据集GSE68086:200,068,086。该数据集包括从患有六种不同肿瘤的患者和健康个体收集的171份血小板样本。RNA测序分析的所有步骤(预处理,读取对齐,转录组重建,表达定量和差异表达分析)。应用基于机器学习的随机森林和梯度提升算法来预测显著基因。Rstudio统计工具用于分析。
    Advances in computer science in combination with the next-generation sequencing have introduced a new era in biology, enabling advanced state-of-the-art analysis of complex biological data. Bioinformatics is evolving as a union field between computer Science and biology, enabling the representation, storage, management, analysis and exploration of many types of data with a plethora of machine learning algorithms and computing tools. In this study, we used machine learning algorithms to detect differentially expressed genes between different types of cancer and showing the existence overlap to final results from RNA-sequencing analysis. The datasets were obtained from the National Center for Biotechnology Information resource. Specifically, dataset GSE68086 which corresponds to PMID:200,068,086. This dataset consists of 171 blood platelet samples collected from patients with six different tumors and healthy individuals. All steps for RNA-sequencing analysis (preprocessing, read alignment, transcriptome reconstruction, expression quantification and differential expression analysis) were followed. Machine Learning- based Random Forest and Gradient Boosting algorithms were applied to predict significant genes. The Rstudio statistical tool was used for the analysis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    结论:高通量测序技术使跨物种比较转录组研究成为可能;然而,由于生物和技术因素,这些研究面临着许多挑战。我们开发了CoSIA(跨物种调查和分析),BioconductorR软件包和Shiny应用程序,为跨组织和物种的Bgee非患病野生型RNA测序基因表达数据的跨物种转录组比较提供了替代框架(人,鼠标,rat,斑马鱼,飞,和线虫)通过变异性的可视化,多样性,和特异性指标。
    方法:https://github.com/lasseignelab/CoSIA。
    背景:见补充材料。
    High-throughput sequencing technologies have enabled cross-species comparative transcriptomic studies; however, there are numerous challenges for these studies due to biological and technical factors. We developed CoSIA (Cross-Species Investigation and Analysis), a Bioconductor R package and Shiny app that provides an alternative framework for cross-species transcriptomic comparison of non-diseased wild-type RNA sequencing gene expression data from Bgee across tissues and species (human, mouse, rat, zebrafish, fly, and nematode) through visualization of variability, diversity, and specificity metrics.
    https://github.com/lasseignelab/CoSIA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:表达蛋白质组学涉及系统内蛋白质丰度的整体评估。反过来,差异表达分析可用于研究扰动此类系统后蛋白质丰度的变化。方法:这里,我们提供了处理的工作流程,基于质谱的定量表达蛋白质组学数据的分析和解释。该工作流程利用Bioconductor项目的开源R软件包,并指导用户端到端和逐步完成分析的每个阶段。作为用例,我们从有和没有处理的HEK293细胞产生表达蛋白质组学数据。值得注意的是,实验包括使用串联质量标签(TMT)技术标记的细胞蛋白和使用无标记定量(LFQ)定量的分泌蛋白。结果:工作流程在专注于数据导入之前解释了软件基础架构,预处理和质量控制。这对于TMT和LFQ数据集单独完成。证明了统计差异表达分析的应用,然后通过基因本体论富集分析进行解释。结论:处理的全面工作流程,表达蛋白质组学的分析和解释。该工作流是蛋白质组学社区的宝贵资源,特别是至少熟悉R的初学者,他们希望了解并做出有关其分析的数据驱动决策。
    Background: Expression proteomics involves the global evaluation of protein abundances within a system. In turn, differential expression analysis can be used to investigate changes in protein abundance upon perturbation to such a system. Methods: Here, we provide a workflow for the processing, analysis and interpretation of quantitative mass spectrometry-based expression proteomics data. This workflow utilizes open-source R software packages from the Bioconductor project and guides users end-to-end and step-by-step through every stage of the analyses. As a use-case we generated expression proteomics data from HEK293 cells with and without a treatment. Of note, the experiment included cellular proteins labelled using tandem mass tag (TMT) technology and secreted proteins quantified using label-free quantitation (LFQ). Results: The workflow explains the software infrastructure before focusing on data import, pre-processing and quality control. This is done individually for TMT and LFQ datasets. The application of statistical differential expression analysis is demonstrated, followed by interpretation via gene ontology enrichment analysis. Conclusions: A comprehensive workflow for the processing, analysis and interpretation of expression proteomics is presented. The workflow is a valuable resource for the proteomics community and specifically beginners who are at least familiar with R who wish to understand and make data-driven decisions with regards to their analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:Kataegis是指癌症中区域基因组超突变的发生,并且是在广泛的恶性肿瘤中观察到的一种现象。kataegis基因座构成具有高突变率的基因组区域(即,紧密散布的体细胞变体的频率高于整体突变背景)。已经表明,kataegis具有生物学意义,并且可能具有临床相关性。因此,一个准确的和强大的工作流程kataegis检测是至关重要的。
    结果:这里我们介绍Katdetectr,一个基于R/Bioconductor的开源软件包,用于基因组数据中kataegis基因座的强大而灵活和快速的检测。此外,Katdetectr拥有表征和可视化kataegis的功能,并以标准化格式提供对后续分析有用的结果。简而言之,Katdetectr导入行业标准格式(MAF,VCF,和VRanges),确定基因组变异的变异距离,并使用修剪的精确线性时间搜索算法执行无监督的变化点分析,然后根据用户定义的参数调用kataegis。我们使用合成数据和全基因组测序恶性肿瘤的先验标记泛癌症数据集来评估Katdetectr和5个公开可用的kataegis检测包的性能。我们的性能评估表明,Katdetectr在肿瘤突变负担方面是稳健的,并且显示出最快的平均计算时间。此外,Katdetectr揭示了两个数据集的所有评估工具的最高准确性(0.99,0.99)和归一化马修斯相关系数(0.98,0.92)。
    结论:Katdetectr是一个强大的检测工作流程,表征,和kataegis的可视化,可在Bioconductor:https://doi.org/doi:10.18129/B9上获得。bioc.Katdetectr.
    Kataegis refers to the occurrence of regional genomic hypermutation in cancer and is a phenomenon that has been observed in a wide range of malignancies. A kataegis locus constitutes a genomic region with a high mutation rate (i.e., a higher frequency of closely interspersed somatic variants than the overall mutational background). It has been shown that kataegis is of biological significance and possibly clinically relevant. Therefore, an accurate and robust workflow for kataegis detection is paramount.
    Here we present Katdetectr, an open-source R/Bioconductor-based package for the robust yet flexible and fast detection of kataegis loci in genomic data. In addition, Katdetectr houses functionalities to characterize and visualize kataegis and provides results in a standardized format useful for subsequent analysis. In brief, Katdetectr imports industry-standard formats (MAF, VCF, and VRanges), determines the intermutation distance of the genomic variants, and performs unsupervised changepoint analysis utilizing the Pruned Exact Linear Time search algorithm followed by kataegis calling according to user-defined parameters.We used synthetic data and an a priori labeled pan-cancer dataset of whole-genome sequenced malignancies for the performance evaluation of Katdetectr and 5 publicly available kataegis detection packages. Our performance evaluation shows that Katdetectr is robust regarding tumor mutational burden and shows the fastest mean computation time. Additionally, Katdetectr reveals the highest accuracy (0.99, 0.99) and normalized Matthews correlation coefficient (0.98, 0.92) of all evaluated tools for both datasets.
    Katdetectr is a robust workflow for the detection, characterization, and visualization of kataegis and is available on Bioconductor: https://doi.org/doi:10.18129/B9.bioc.katdetectr.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们定义并鉴定了用于下一代测序的一类新的控制基因,称为总RNA表达基因(TREGs)。这与不同大小和转录活性的细胞类型中的总RNA丰度相关。我们提供了一种数据驱动的方法来从单细胞RNA测序数据中识别TREG,当仅限于定量有限数量的基因时,允许估计RNA的总量。我们使用多重单分子荧光原位杂交技术在死后人脑中展示了我们的方法,并将候选TREG与经典管家基因进行了比较。我们将AKT3确定为五个大脑区域的顶级TREG。
    We define and identify a new class of control genes for next-generation sequencing called total RNA expression genes (TREGs), which correlate with total RNA abundance in cell types of different sizes and transcriptional activity. We provide a data-driven method to identify TREGs from single-cell RNA sequencing data, allowing the estimation of total amount of RNA when restricted to quantifying a limited number of genes. We demonstrate our method in postmortem human brain using multiplex single-molecule fluorescent in situ hybridization and compare candidate TREGs against classic housekeeping genes. We identify AKT3 as a top TREG across five brain regions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:亚硫酸氢盐测序是分析基因组甲基化的强大工具,对癌症的理解至关重要的表观遗传修饰,精神疾病,和许多其他条件。全基因组亚硫酸氢盐测序(WGBS)产生的原始数据在准备进行统计分析之前需要几个计算步骤,并且需要特别注意以及时和内存高效的方式处理数据。与参考基因组的比对是WGBS工作流程中计算要求最高的步骤之一。使用常用的WGBS专用校准软件需要几个小时甚至几天。这自然会促使创建计算工作流,这些计算工作流可以利用基于GPU的对准软件来大大加快瓶颈步骤。此外,WGBS产生的原始数据很大,通常很笨拙;现有管道缺乏内存高效的数据表示,使得WGBS对许多研究人员来说不切实际或不可能。
    结果:我们介绍了BiocMAP,由两个模块组成的生物导体友好的甲基化分析管道,解决上述问题。第一个模块使用Arioc执行计算密集型读取对齐,GPU加速的短读取对准器。由于GPU并不总是在传统的基于CPU的分析很方便的相同计算环境上可用,第二模块可以在无GPU环境中运行。该模块提取并合并DNA甲基化比例-在给定基因组位点处样品中所有细胞中的甲基化胞嘧啶的部分。R中基于生物导体的输出对象利用磁盘上的数据表示来大大减少所需的主存储器,并使WGBS项目在计算上对更多研究人员可行。
    结论:BiocMAP使用Nextflow实现,可在http://research上获得。libd.org/BiocMAP/.为了实现跨各种典型计算环境的可重复分析,BiocMAP可以用Docker或Singularity进行容器化,并在本地或使用SLURM或SGE调度引擎执行。通过提供生物导体对象,BiocMAP的输出可以与强大的分析开源软件集成,用于分析甲基化数据。
    BACKGROUND: Bisulfite sequencing is a powerful tool for profiling genomic methylation, an epigenetic modification critical in the understanding of cancer, psychiatric disorders, and many other conditions. Raw data generated by whole genome bisulfite sequencing (WGBS) requires several computational steps before it is ready for statistical analysis, and particular care is required to process data in a timely and memory-efficient manner. Alignment to a reference genome is one of the most computationally demanding steps in a WGBS workflow, taking several hours or even days with commonly used WGBS-specific alignment software. This naturally motivates the creation of computational workflows that can utilize GPU-based alignment software to greatly speed up the bottleneck step. In addition, WGBS produces raw data that is large and often unwieldy; a lack of memory-efficient representation of data by existing pipelines renders WGBS impractical or impossible to many researchers.
    RESULTS: We present BiocMAP, a Bioconductor-friendly methylation analysis pipeline consisting of two modules, to address the above concerns. The first module performs computationally-intensive read alignment using Arioc, a GPU-accelerated short-read aligner. Since GPUs are not always available on the same computing environments where traditional CPU-based analyses are convenient, the second module may be run in a GPU-free environment. This module extracts and merges DNA methylation proportions-the fractions of methylated cytosines across all cells in a sample at a given genomic site. Bioconductor-based output objects in R utilize an on-disk data representation to drastically reduce required main memory and make WGBS projects computationally feasible to more researchers.
    CONCLUSIONS: BiocMAP is implemented using Nextflow and available at http://research.libd.org/BiocMAP/ . To enable reproducible analysis across a variety of typical computing environments, BiocMAP can be containerized with Docker or Singularity, and executed locally or with the SLURM or SGE scheduling engines. By providing Bioconductor objects, BiocMAP\'s output can be integrated with powerful analytical open source software for analyzing methylation data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号