关键词: differentially expressed genes (DEGs) feature selection (FS) gene ontology high-throughput technologies inflammatory bowel disease (IBD) machine learning (ML) pathway enrichment analysis

来  源:   DOI:10.3390/diagnostics14111182   PDF(Pubmed)

Abstract:
This study, utilizing high-throughput technologies and Machine Learning (ML), has identified gene biomarkers and molecular signatures in Inflammatory Bowel Disease (IBD). We could identify significant upregulated or downregulated genes in IBD patients by comparing gene expression levels in colonic specimens from 172 IBD patients and 22 healthy individuals using the GSE75214 microarray dataset. Our ML techniques and feature selection methods revealed six Differentially Expressed Gene (DEG) biomarkers (VWF, IL1RL1, DENND2B, MMP14, NAAA, and PANK1) with strong diagnostic potential for IBD. The Random Forest (RF) model demonstrated exceptional performance, with accuracy, F1-score, and AUC values exceeding 0.98. Our findings were rigorously validated with independent datasets (GSE36807 and GSE10616), further bolstering their credibility and showing favorable performance metrics (accuracy: 0.841, F1-score: 0.734, AUC: 0.887). Our functional annotation and pathway enrichment analysis provided insights into crucial pathways associated with these dysregulated genes. DENND2B and PANK1 were identified as novel IBD biomarkers, advancing our understanding of the disease. The validation in independent cohorts enhances the reliability of these findings and underscores their potential for early detection and personalized treatment of IBD. Further exploration of these genes is necessary to fully comprehend their roles in IBD pathogenesis and develop improved diagnostic tools and therapies. This study significantly contributes to IBD research with valuable insights, potentially greatly enhancing patient care.
摘要:
这项研究,利用高通量技术和机器学习(ML),已经确定了炎症性肠病(IBD)的基因生物标志物和分子特征。我们可以通过使用GSE75214微阵列数据集比较来自172名IBD患者和22名健康个体的结肠样本中的基因表达水平来鉴定IBD患者中显著上调或下调的基因。我们的ML技术和特征选择方法揭示了六个差异表达基因(DEG)生物标志物(VWF,IL1RL1,DENND2B,MMP14,NAAA,和PANK1)对IBD具有很强的诊断潜力。随机森林(RF)模型表现出卓越的性能,准确地说,F1分数,和AUC值超过0.98。我们的发现经过独立数据集(GSE36807和GSE10616)的严格验证,进一步增强其可信度并显示良好的性能指标(准确性:0.841,F1得分:0.734,AUC:0.887)。我们的功能注释和途径富集分析提供了与这些失调基因相关的关键途径的见解。DENND2B和PANK1被鉴定为新的IBD生物标志物,提高我们对疾病的认识.独立队列的验证增强了这些发现的可靠性,并强调了它们早期发现和个性化治疗IBD的潜力。需要进一步探索这些基因以充分理解它们在IBD发病机理中的作用并开发改进的诊断工具和疗法。这项研究为IBD研究提供了有价值的见解,有可能大大加强病人的护理。
公众号