关键词: RNA sequencing analysis workflow cloud computing gene expression microbial genomics training

Mesh : Cloud Computing Computational Biology / methods Sequence Analysis, RNA / methods Software Gene Expression Regulation, Bacterial

来  源:   DOI:10.1093/bib/bbae301   PDF(Pubmed)

Abstract:
This manuscript describes the development of a resource module that is part of a learning platform named \"NIGMS Sandbox for Cloud-based Learning\" https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on RNA sequencing (RNAseq) data analysis in an interactive format that uses appropriate cloud resources for data access and analyses. Biomedical research is increasingly data-driven, and dependent upon data management and analysis methods that facilitate rigorous, robust, and reproducible research. Cloud-based computing resources provide opportunities to broaden the application of bioinformatics and data science in research. Two obstacles for researchers, particularly those at small institutions, are: (i) access to bioinformatics analysis environments tailored to their research; and (ii) training in how to use Cloud-based computing resources. We developed five reusable tutorials for bulk RNAseq data analysis to address these obstacles. Using Jupyter notebooks run on the Google Cloud Platform, the tutorials guide the user through a workflow featuring an RNAseq dataset from a study of prophage altered drug resistance in Mycobacterium chelonae. The first tutorial uses a subset of the data so users can learn analysis steps rapidly, and the second uses the entire dataset. Next, a tutorial demonstrates how to analyze the read count data to generate lists of differentially expressed genes using R/DESeq2. Additional tutorials generate read counts using the Snakemake workflow manager and Nextflow with Google Batch. All tutorials are open-source and can be used as templates for other analysis.
摘要:
本手稿描述了资源模块的开发,该模块是名为“NIGMSSandboxforCloud-basedLearning”的学习平台的一部分https://github.com/NIGMS/NIGMS-Sandbox。沙箱的整体起源在本补编开头的编辑NIGMS沙箱中进行了描述。该模块以交互式格式提供有关RNA测序(RNAseq)数据分析的学习材料,该格式使用适当的云资源进行数据访问和分析。生物医学研究越来越受数据驱动,并依赖于促进严格的数据管理和分析方法,健壮,和可重复的研究。基于云的计算资源为拓宽生物信息学和数据科学在研究中的应用提供了机会。研究人员的两个障碍,特别是那些小型机构,(i)访问适合其研究的生物信息学分析环境;(ii)培训如何使用基于云的计算资源。我们为批量RNAseq数据分析开发了五个可重用的教程,以解决这些障碍。使用在GoogleCloudPlatform上运行的Jupyter笔记本,这些教程指导用户完成一个工作流程,该工作流程包含一个RNAseq数据集,该数据集来自一项研究,该研究是对龟分枝杆菌中的profage改变的耐药性的研究。第一个教程使用数据的子集,因此用户可以快速学习分析步骤,第二个使用整个数据集。接下来,教程演示了如何使用R/DESeq2分析读取计数数据以生成差异表达基因列表。其他教程使用Snakemake工作流管理器和GoogleBatch的Nextflow生成读取计数。所有教程都是开源的,可以用作其他分析的模板。
公众号