关键词: CAGI Peutz-Jeghers syndrome STK11 cancer kinase machine learning variant effect prediction

来  源:   DOI:10.21203/rs.3.rs-4587317/v1   PDF(Pubmed)

Abstract:
Critical evaluation of computational tools for predicting variant effects is important considering their increased use in disease diagnosis and driving molecular discoveries. In the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, a dataset of 28 STK11 rare variants (27 missense, 1 single amino acid deletion), identified in primary non-small cell lung cancer biopsies, was experimentally assayed to characterize computational methods from four participating teams and five publicly available tools. Predictors demonstrated a high level of performance on key evaluation metrics, measuring correlation with the assay outputs and separating loss-of-function (LoF) variants from wildtype-like (WT-like) variants. The best participant model, 3Cnet, performed competitively with well-known tools. Unique to this challenge was that the functional data was generated with both biological and technical replicates, thus allowing the assessors to realistically establish maximum predictive performance based on experimental variability. Three out of the five publicly available tools and 3Cnet approached the performance of the assay replicates in separating LoF variants from WT-like variants. Surprisingly, REVEL, an often-used model, achieved a comparable correlation with the real-valued assay output as that seen for the experimental replicates. Performing variant interpretation by combining the new functional evidence with computational and population data evidence led to 16 new variants receiving a clinically actionable classification of likely pathogenic (LP) or likely benign (LB). Overall, the STK11 challenge highlights the utility of variant effect predictors in biomedical sciences and provides encouraging results for driving research in the field of computational genome interpretation.
摘要:
考虑到它们在疾病诊断和驱动分子发现中的使用增加,对预测变异效应的计算工具进行严格评估非常重要。在第六版的关键基因组解释评估(CAGI)挑战,28个STK11罕见变体的数据集(27个错觉,1个单氨基酸缺失),在原发性非小细胞肺癌活检中发现,进行了实验分析,以表征来自四个参与团队和五个公开可用工具的计算方法。预测器在关键评估指标上表现出高水平的表现,测量与测定输出的相关性并将功能丧失(LoF)变体与野生型样(WT样)变体分离。最好的参与者模型,3Cnet,与知名工具进行竞争。这一挑战的独特之处在于功能数据是通过生物学和技术复制生成的,从而使评估人员能够根据实验的变异性来真实地建立最大的预测性能。五个公开可用的工具和3Cnet中的三个在分离LoF变体与WT样变体中接近测定重复的性能。令人惊讶的是,REVEL,一个经常使用的模型,与实验重复所看到的实际值测定输出具有相当的相关性。通过将新的功能证据与计算和群体数据证据相结合来进行变体解释,导致16种新的变体接受了可能的致病性(LP)或可能的良性(LB)的临床可操作分类。总的来说,STK11挑战强调了变异效应预测因子在生物医学科学中的实用性,并为推动计算基因组解释领域的研究提供了令人鼓舞的结果.
公众号