关键词: data aggregation genetic variation high-throughput nucleotide sequencing molecular sequence annotation protein annotations

来  源:   DOI:10.1093/jamiaopen/ooab065   PDF(Pubmed)

Abstract:
BACKGROUND: Genomic data are prevalent, leading to frequent encounters with uninterpreted variants or mutations with unknown mechanisms of effect. Researchers must manually aggregate data from multiple sources and across related proteins, mentally translating effects between the genome and proteome, to attempt to understand mechanisms.
METHODS: P2T2 presents diverse data and annotation types in a unified protein-centric view, facilitating the interpretation of coding variants and hypothesis generation. Information from primary sequence, domain, motif, and structural levels are presented and also organized into the first Paralog Annotation Analysis across the human proteome.
RESULTS: Our tool assists research efforts to interpret genomic variation by aggregating diverse, relevant, and proteome-wide information into a unified interactive web-based interface. Additionally, we provide a REST API enabling automated data queries, or repurposing data for other studies.
CONCLUSIONS: The unified protein-centric interface presented in P2T2 will help researchers interpret novel variants identified through next-generation sequencing. Code and server link available at github.com/GenomicInterpretation/p2t2.
摘要:
背景:基因组数据很普遍,导致频繁遇到未知作用机制的未解释变异或突变。研究人员必须手动汇总来自多个来源和相关蛋白质的数据,在基因组和蛋白质组之间的精神翻译效应,试图理解机制。
方法:P2T2以统一的蛋白质为中心的观点呈现不同的数据和注释类型,促进编码变体的解释和假设的生成。来自主序列的信息,域,主题,和结构水平被提出,并组织成第一个模拟注释分析在整个人类蛋白质组。
结果:我们的工具通过聚合多样性来帮助研究努力解释基因组变异,相关,和蛋白质组范围内的信息到一个统一的交互式网络界面。此外,我们提供了一个支持自动数据查询的RESTAPI,或将数据重新用于其他研究。
结论:P2T2中呈现的统一的以蛋白质为中心的界面将有助于研究人员解释通过下一代测序鉴定的新变体。代码和服务器链接可在github.com/GenomicInterpretation/p2t2获得。
公众号