关键词: open science workflow workflow execution service workflow language

Mesh : Workflow Computational Biology / methods Software Programming Languages

来  源:   DOI:10.12688/f1000research.122924.2   PDF(Pubmed)

Abstract:
The increased demand for efficient computation in data analysis encourages researchers in biomedical science to use workflow systems. Workflow systems, or so-called workflow languages, are used for the description and execution of a set of data analysis steps. Workflow systems increase the productivity of researchers, specifically in fields that use high-throughput DNA sequencing applications, where scalable computation is required. As systems have improved the portability of data analysis workflows, research communities are able to share workflows to reduce the cost of building ordinary analysis procedures. However, having multiple workflow systems in a research field has resulted in the distribution of efforts across different workflow system communities. As each workflow system has its unique characteristics, it is not feasible to learn every single system in order to use publicly shared workflows. Thus, we developed Sapporo, an application to provide a unified layer of workflow execution upon the differences of various workflow systems. Sapporo has two components: an application programming interface (API) that receives the request of a workflow run and a browser-based client for the API. The API follows the Workflow Execution Service API standard proposed by the Global Alliance for Genomics and Health. The current implementation supports the execution of workflows in four languages: Common Workflow Language, Workflow Description Language, Snakemake, and Nextflow. With its extensible and scalable design, Sapporo can support the research community in utilizing valuable resources for data analysis.
摘要:
对数据分析中高效计算的需求增加,鼓励生物医学科学研究人员使用工作流系统。工作流系统,或者所谓的工作流语言,用于描述和执行一组数据分析步骤。工作流系统提高了研究人员的生产力,特别是在使用高通量DNA测序应用的领域,其中需要可扩展计算。由于系统提高了数据分析工作流程的可移植性,研究社区能够共享工作流程,以降低构建普通分析程序的成本。然而,在一个研究领域拥有多个工作流系统导致了不同工作流系统社区的努力分布。由于每个工作流系统都有其独特的特点,为了使用公开共享的工作流,学习每一个系统是不可行的。因此,我们开发了札幌,一种应用程序,用于根据各种工作流系统的差异提供统一的工作流执行层。札幌有两个组件:接收工作流运行请求的应用程序编程接口(API)和基于浏览器的API客户端。该API遵循全球基因组学和健康联盟提出的工作流执行服务API标准。当前实现支持以四种语言执行工作流:通用工作流语言、工作流描述语言,蛇饼,和Nextflow。凭借其可扩展和可扩展的设计,札幌可以支持研究社区利用宝贵的资源进行数据分析。
公众号