variants prioritization

  • 文章类型: Journal Article
    种系和体细胞变异的综合分析需要复杂的计算方法,将基于下一代测序(NGS)的组学数据与来自公共存储库的精选注释相结合。这里,我们描述了结构PPi,这有助于将癌症相关变异分析到蛋白质3D结构上,交互接口,和其他重要的功能站点(即,催化,配体结合,翻译后修饰)。我们的方法依赖于从Interactome3D中提取的特征,UniProtKB,InterPro,APPRIS,dbNSFP,和COSMIC数据库,并提供致病性预测方法的补充信息。因此,Structure-PPi有助于鉴别假阳性预测,并增加了对变异在给定癌症中的作用的机制和生物学见解。这些工具的在线版本可在https://rbbt获得。bsc.ES/结构PPI/。
    A comprehensive analysis of germline and somatic variants requires complex computational approaches that combine next-generation sequencing (NGS)-based omics data with curated annotations from public repositories. Here, we describe Structure-PPi, which facilitates the analysis of cancer-related variants onto protein 3D structures, interaction interfaces, and other important functional sites (i.e., catalytic, ligand-binding, posttranslational modification). Our approach relies on features extracted from Interactome3D, UniProtKB, InterPro, APPRIS, dbNSFP, and COSMIC databases and provides complementary information to pathogenicity prediction methods. Thus, Structure-PPi helps in the discrimination of false-positive predictions and adds both mechanistic and biological insights into the role of variants in a given cancer. An online version of the tools is available at https://rbbt.bsc.es/StructurePPI/ .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    The Critical Assessment of Genome Interpretation (CAGI) experiment is the first attempt to evaluate the state-of-the-art in genetic data interpretation. Among the proposed challenges, Crohn disease (CD) risk prediction has become the most classic problem spanning three editions. The scientific question is very hard: can anybody assess the risk to develop CD given the exome data alone? This is one of the ultimate goals of genetic analysis, which motivated most CAGI participants to look for powerful new methods. In the 2016 CD challenge, we implemented all the best methods proposed in the past editions. This resulted in 10 algorithms, which were evaluated fairly by CAGI organizers. We also used all the data available from CAGI 11 and 13 to maximize the amount of training samples. The most effective algorithms used known genes associated with CD from the literature. No method could evaluate effectively the importance of unannotated variants by using heuristics. As a downside, all CD datasets were strongly affected by sample stratification. This affected the performance reported by assessors. Therefore, we expect that future datasets will be normalized in order to remove population effects. This will improve methods comparison and promote algorithms focused on causal variants discovery.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号