关键词: Database GenBank GenBase INSDC Nucleotide sequences

来  源:   DOI:10.1093/gpbjnl/qzae047

Abstract:
The rapid advancement of sequencing technologies poses challenges in managing the large volume and exponential growth of sequence data efficiently and on time. To address this issue, we present GenBase (https://ngdc.cncb.ac.cn/genbase), an open-access data repository that follows the International Nucleotide Sequence Database Collaboration (INSDC) data standards and structures, for efficient nucleotide sequence archiving, searching, and sharing. As a core resource within the National Genomics Data Center (NGDC), of the China National Center for Bioinformation (CNCB; https://ngdc.cncb.ac.cn), GenBase offers bilingual submission pipeline and services, as well as local submission assistance in China. GenBase also provides a unique Excel format for metadata description and feature annotation of nucleotide sequences, along with a real-time data validation system to streamline sequence submissions. As of April 23, 2024, GenBase received 68,251 nucleotide sequences and 689,574 annotated protein sequences across 414 species from 2319 submissions. Out of these, 63,614 (93%) nucleotide sequences and 620,640 (90%) annotated protein sequences have been released and are publicly accessible through GenBase\'s web search system, File Transfer Protocol (FTP), and Application Programming Interface (API). Additionally, in collaboration with INSDC, GenBase has constructed an effective data exchange mechanism with GenBank and started sharing released nucleotide sequences. Furthermore, GenBase integrates all sequences from GenBank with daily updates, demonstrating its commitment to actively contributing to global sequence data management and sharing.
摘要:
测序技术的快速发展在有效且及时地管理大量和指数增长的序列数据方面提出了挑战。为了解决这个问题,我们介绍GenBase(https://ngdc。cncb.AC.cn/genbase),遵循国际核苷酸序列数据库协作(INSDC)数据标准和结构的开放存取数据存储库,用于高效的核苷酸序列归档,搜索,和分享。作为国家基因组学数据中心(NGDC)的核心资源,中国国家生物信息中心(CNCB;https://ngdc。cncb.AC.cn),GenBase提供双语提交管道和服务,以及中国当地的提交协助。GenBase还为核苷酸序列的元数据描述和特征注释提供了独特的Excel格式,以及实时数据验证系统,以简化序列提交。截至2024年4月23日,GenBase收到了来自2319个提交的414个物种的68,251个核苷酸序列和689,574个注释的蛋白质序列。在这些中,63,614(93%)个核苷酸序列和620,640(90%)个带注释的蛋白质序列已发布,可通过GenBase的网络搜索系统公开访问。文件传输协议(FTP),和应用程序编程接口(API)。此外,与INSDC合作,GenBase已经与GenBank构建了有效的数据交换机制,并开始共享已发布的核苷酸序列。此外,GenBase将GenBank的所有序列与每日更新整合在一起,表明其致力于为全球序列数据管理和共享做出积极贡献。
公众号