关键词: Genome K-mer Nullomer Prime Proteome Quasi-prime

来  源:   DOI:10.1016/j.csbj.2024.04.050   PDF(Pubmed)

Abstract:
The decrease in sequencing expenses has facilitated the creation of reference genomes and proteomes for an expanding array of organisms. Nevertheless, no established repository that details organism-specific genomic and proteomic sequences of specific lengths, referred to as kmers, exists to our knowledge. In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer-based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 54,039 and 21,865 reference genomes and proteomes, respectively, as well as 6,905,362 and 149,305,183 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at: www.kmerdb.com.
摘要:
测序费用的减少促进了用于扩大的生物体阵列的参考基因组和蛋白质组的创建。然而,没有建立详细说明特定长度的生物体特异性基因组和蛋白质组序列的存储库,被称为kmers,存在于我们的知识中。在这篇文章中,我们介绍kmerDB,通过交互式网络界面访问的数据库,该界面以系统的方式从基因组和蛋白质组序列中提供基于kmer的信息。kmerDB目前包含202,340,859,107个碱基对和19,304,903,356个氨基酸,跨越54,039和21,865个参考基因组和蛋白质组,分别,以及6,905,362和149,305,183基因组和蛋白质组物种特异性序列,称为准素数。此外,我们提供了每个基因组和蛋白质组中缺失的5,186,757核酸和214,904,089肽序列的访问,称为素数。kmerDB具有用户友好的界面,提供各种搜索选项和过滤器,以便于解析和搜索。该服务可在www上获得。kmerdb.com.
公众号