关键词: Curation tools Data integration ENA Metadata Omics data Sequencing data

Mesh : Workflow Software Data Curation / methods Metadata Databases, Genetic Genomics / methods Computational Biology / methods

来  源:   DOI:10.1186/s12859-024-05803-9   PDF(Pubmed)

Abstract:
BACKGROUND: Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing.
RESULTS: Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources.
CONCLUSIONS: Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.
摘要:
背景:测序技术的重大进展以及科学中数据和元数据的共享已经产生了大量公开可用的数据集。然而,尽管做出了这些努力,但与公共组学数据集合作,尤其是管理公共组学数据集仍然具有挑战性。虽然越来越多的举措旨在重复使用以前的成果,这些目前的限制往往导致需要进一步的内部管理和处理。
结果:这里,我们介绍了OMD固化工具包(OMD固化工具包),一个python3软件包,旨在在公共组学数据集的元数据和fastq文件的策展过程中陪伴和指导研究人员。此工作流提供了具有多种功能(集合,控制检查,处理和整合),以促进策划公共测序数据项目的艰巨任务。虽然以欧洲核苷酸档案(ENA)为中心,提供的大多数工具都是通用的,可用于管理来自不同来源的数据集。
结论:因此,它为以前重新使用公共组学数据所需的内部策展提供了有价值的工具。由于其工作流结构和功能,在基于测序数据开发新的组学荟萃分析中,它可以很容易地使用,并使研究者受益.
公众号