关键词: Holstein–Friesian SNP artificial intelligence cattle clinical mastitis deep learning enrichment

Mesh : Cattle Mastitis, Bovine / genetics Animals Polymorphism, Single Nucleotide Deep Learning Female Whole Genome Sequencing / methods Genetic Predisposition to Disease Genotype

来  源:   DOI:10.3390/ijms25094715   PDF(Pubmed)

Abstract:
The serious drawback underlying the biological annotation of whole-genome sequence data is the p >> n problem, which means that the number of polymorphic variants (p) is much larger than the number of available phenotypic records (n). We propose a way to circumvent the problem by combining a LASSO logistic regression with deep learning to classify cows as susceptible or resistant to mastitis, based on single nucleotide polymorphism (SNP) genotypes. Among several architectures, the one with 204,642 SNPs was selected as the best. This architecture was composed of two layers with, respectively, 7 and 46 units per layer implementing respective drop-out rates of 0.210 and 0.358. The classification of the test data resulted in AUC = 0.750, accuracy = 0.650, sensitivity = 0.600, and specificity = 0.700. Significant SNPs were selected based on the SHapley Additive exPlanation (SHAP). As a final result, one GO term related to the biological process and thirteen GO terms related to molecular function were significantly enriched in the gene set that corresponded to the significant SNPs. Our findings revealed that the optimal approach can correctly predict susceptibility or resistance status for approximately 65% of cows. Genes marked by the most significant SNPs are related to the immune response and protein synthesis.
摘要:
全基因组序列数据的生物学注释的严重缺陷是p>>n问题,这意味着多态变体的数量(p)远大于可用表型记录的数量(n)。我们提出了一种方法,通过将LASSO逻辑回归与深度学习相结合,将奶牛分类为对乳腺炎易感或抗性,来规避这个问题。基于单核苷酸多态性(SNP)基因型。在几种架构中,具有204,642个SNP的一个被选为最佳的。这个架构由两层组成,分别,实现0.210和0.358的相应脱落率的每个层的7和46个单元。测试数据的分类导致AUC=0.750,准确度=0.650,灵敏度=0.600,和特异性=0.700。基于SHapley添加剂扩增(SHAP)选择显著的SNP。作为最后的结果,与生物过程相关的一个GO术语和与分子功能相关的13个GO术语在对应于显著SNP的基因集中显著富集。我们的发现表明,最佳方法可以正确预测大约65%的奶牛的易感性或抗性状态。以最重要的SNP标记的基因与免疫应答和蛋白质合成有关。
公众号