关键词: Machine Learning Monogenic Disease Pediatric IBD Whole-exome Sequencing

来  源:   DOI:10.1016/j.gastha.2021.11.002   PDF(Pubmed)

Abstract:
UNASSIGNED: Diagnosis of monogenic disease is increasingly important for patient care and personalizing therapy. However, the current process is nonstandardized, expensive, and time consuming. There is currently no accepted strategy to help identify disease-causing variants in monogenic inflammatory bowel disease (IBD). The aim of the study is to develop a prioritization strategy for monogenic IBD variant discovery through detailed analysis of a whole-exome sequencing (WES) data set.
UNASSIGNED: All consenting pediatric patients with IBD presenting to our tertiary care hospital during the study period were enrolled and underwent WES (n = 1005). Available family members also underwent WES. Variants were analyzed en masse using the GEMINI framework and were further annotated using data from dbNSFP, Combined Annotation Dependent Depletion, and gnomAD. Known disease-causing variants (n = 36) were used as positive controls. Machine learning algorithms were optimized and then compared to assist with identifying monogenic IBD case characteristics.
UNASSIGNED: Initial gene-level analysis identified 11 genes not previously linked to IBD that could potentially harbor IBD-causing variants. Machine learning algorithms identified 4 primary variant characteristics (Combined Annotation Dependent Depletion score, dbNSFP score, relationship with a known immunodeficiency gene, and alternate allele frequency), and optimal threshold values for each were determined to assist with identifying monogenic IBD variants. Based on these characteristics, an automated variant prioritization pipeline was then created that filters and prioritizes variants from >100,000 variants per patient down to a mean of 15. This pipeline is available online for all to use.
UNASSIGNED: Leveraging a large WES data set, we demonstrate a statistically rigorous strategy for prioritization of variants for monogenic IBD diagnosis.
摘要:
单基因疾病的诊断对于患者护理和个性化治疗越来越重要。然而,目前的流程是非标准化的,贵,和耗时。目前没有公认的策略来帮助识别单基因炎症性肠病(IBD)中的致病变体。该研究的目的是通过对全外显子组测序(WES)数据集的详细分析,为单基因IBD变异发现制定优先策略。
所有在研究期间到我们的三级护理医院就诊的IBD儿科患者均被纳入并接受WES(n=1005)。可用的家庭成员也接受了WES。使用GEMINI框架对变体进行了整体分析,并使用dbNSFP的数据进行了进一步注释,组合注释相关耗尽,和gnomad。已知的致病变体(n=36)用作阳性对照。优化机器学习算法,然后进行比较,以帮助识别单基因IBD病例特征。
最初的基因水平分析确定了11个以前与IBD无关的基因,这些基因可能会引起IBD的变异。机器学习算法确定了4个主要变量特征(组合注释相关损耗分数,dbNSFP分数,与已知的免疫缺陷基因的关系,和交替等位基因频率),并确定每个的最佳阈值以帮助鉴定单基因IBD变体。基于这些特点,然后,我们创建了一个自动化的变异体优先排序流水线,该流水线对变异体进行筛选并优先排序,每个患者的变异体>100,000个变异体平均为15个.此管道可在线供所有人使用。
利用大型WES数据集,我们展示了在单基因IBD诊断中对变异体进行优先排序的统计学严格策略.
公众号