关键词: DNA methylation Google Cloud computing R Bioconductor epigenomics multi-omics integration transcriptomics

Mesh : Humans Cloud Computing Epigenomics / methods Epigenesis, Genetic Transcriptome Computational Biology / methods Gene Expression Profiling / methods Software Data Mining / methods

来  源:   DOI:10.1093/bib/bbae352   PDF(Pubmed)

Abstract:
Multi-omics (genomics, transcriptomics, epigenomics, proteomics, metabolomics, etc.) research approaches are vital for understanding the hierarchical complexity of human biology and have proven to be extremely valuable in cancer research and precision medicine. Emerging scientific advances in recent years have made high-throughput genome-wide sequencing a central focus in molecular research by allowing for the collective analysis of various kinds of molecular biological data from different types of specimens in a single tissue or even at the level of a single cell. Additionally, with the help of improved computational resources and data mining, researchers are able to integrate data from different multi-omics regimes to identify new prognostic, diagnostic, or predictive biomarkers, uncover novel therapeutic targets, and develop more personalized treatment protocols for patients. For the research community to parse the scientifically and clinically meaningful information out of all the biological data being generated each day more efficiently with less wasted resources, being familiar with and comfortable using advanced analytical tools, such as Google Cloud Platform becomes imperative. This project is an interdisciplinary, cross-organizational effort to provide a guided learning module for integrating transcriptomics and epigenetics data analysis protocols into a comprehensive analysis pipeline for users to implement in their own work, utilizing the cloud computing infrastructure on Google Cloud. The learning module consists of three submodules that guide the user through tutorial examples that illustrate the analysis of RNA-sequence and Reduced-Representation Bisulfite Sequencing data. The examples are in the form of breast cancer case studies, and the data sets were procured from the public repository Gene Expression Omnibus. The first submodule is devoted to transcriptomics analysis with the RNA sequencing data, the second submodule focuses on epigenetics analysis using the DNA methylation data, and the third submodule integrates the two methods for a deeper biological understanding. The modules begin with data collection and preprocessing, with further downstream analysis performed in a Vertex AI Jupyter notebook instance with an R kernel. Analysis results are returned to Google Cloud buckets for storage and visualization, removing the computational strain from local resources. The final product is a start-to-finish tutorial for the researchers with limited experience in multi-omics to integrate transcriptomics and epigenetics data analysis into a comprehensive pipeline to perform their own biological research.This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning\'\' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [16] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.
CONCLUSIONS:
摘要:
多组学(基因组学,转录组学,表观基因组学,蛋白质组学,代谢组学,等。)研究方法对于理解人类生物学的分层复杂性至关重要,并且已被证明在癌症研究和精准医学中非常有价值。近年来新兴的科学进步使高通量全基因组测序成为分子研究的中心焦点,它允许对来自单个组织或甚至单个细胞水平的不同类型标本的各种分子生物学数据进行集体分析。此外,在改进的计算资源和数据挖掘的帮助下,研究人员能够整合来自不同多组学方案的数据,以确定新的预后,诊断,或预测性生物标志物,发现新的治疗靶点,并为患者制定更个性化的治疗方案。为了让研究团体更有效地从每天生成的所有生物数据中解析出科学和临床意义的信息,同时减少资源浪费,熟悉并舒适地使用先进的分析工具,例如GoogleCloudPlatform势在必行。这个项目是一个跨学科的,跨组织努力提供指导学习模块,将转录组学和表观遗传学数据分析协议集成到全面的分析管道中,供用户在自己的工作中实施,利用GoogleCloud上的云计算基础架构。学习模块由三个子模块组成,这些子模块指导用户完成教程示例,这些示例说明了RNA序列和减少表征的亚硫酸氢盐测序数据的分析。这些例子是乳腺癌案例研究的形式,数据集从公共存储库基因表达Omnibus获得。第一个子模块致力于使用RNA测序数据进行转录组学分析,第二个子模块侧重于使用DNA甲基化数据的表观遗传学分析,第三个子模块集成了这两种方法,以实现更深入的生物学理解。这些模块从数据收集和预处理开始,在具有R内核的VertexAIJupyter笔记本实例中执行进一步的下游分析。分析结果将返回到GoogleCloud存储桶进行存储和可视化,从本地资源中删除计算应变。最终产品是一个开始到完成的教程,研究人员在多组学方面的经验有限,将转录组学和表观遗传学数据分析整合到一个全面的管道中,以执行自己的生物学研究。本手稿描述了资源模块的开发,该模块是名为“NIGMSSandboxforCloud-basedLearning\'\'https://github.com/NIGMS/NIGMS-Sandbox”的学习平台的一部分。沙箱的整体起源在本补编开头的社论NIGMS沙箱[16]中进行了描述。该模块以交互式格式提供有关批量和单细胞ATAC-seq数据分析的学习材料,该格式使用适当的云资源进行数据访问和分析。
结论:
公众号