使用 NCBI 数据集探索和检索生命树中物种的序列和元数据。Exploring and retrieving sequence and metadata for species across the tree of life with NCBI Datasets.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

To explore complex biological questions, it is often necessary to access various data types from public data repositories. As the volume and complexity of biological sequence data grow, public repositories face significant challenges in ensuring that the data is easily discoverable and usable by the biological research community. To address these challenges, the National Center for Biotechnology Information (NCBI) has created NCBI Datasets. This resource provides straightforward, comprehensive, and scalable access to biological sequences, annotations, and metadata for a wide range of taxa. Following the FAIR (Findable, Accessible, Interoperable, and Reusable) data management principles, NCBI Datasets offers user-friendly web interfaces, command-line tools, and documented APIs, empowering researchers to access NCBI data seamlessly. The data is delivered as packages of sequences and metadata, thus facilitating improved data retrieval, sharing, and usability in research. Moreover, this data delivery method fosters effective data attribution and promotes its further reuse. This paper outlines the current scope of data accessible through NCBI Datasets and explains various options for exploring and downloading the data.

摘要：

为了探索复杂的生物学问题，通常需要从公共数据存储库中访问各种数据类型。随着生物序列数据的数量和复杂性的增长，公共存储库在确保数据易于被生物研究界发现和使用方面面临重大挑战。为了应对这些挑战，国家生物技术信息中心（NCBI）创建了NCBI数据集。此资源提供了简单的，全面,以及对生物序列的可扩展访问，注释,和各种分类单元的元数据。FollowingtheFAIR(Findable,可访问,互操作,和可重用)数据管理原则，NCBI数据集提供用户友好的Web界面，命令行工具，和文档化的API，使研究人员能够无缝访问NCBI数据。数据作为序列和元数据的包交付，从而促进改进的数据检索，分享,和研究中的可用性。此外，这种数据交付方式促进了有效的数据归属，并促进了数据的进一步重用。本文概述了通过NCBI数据集访问的数据的当前范围，并解释了探索和下载数据的各种选项。