关键词: Diagnostic model Gene set enrichment analysis Immunocytes infiltration Machine learning algorithms Ulcerative colitis

Mesh : Humans Machine Learning Colitis, Ulcerative / genetics diagnosis Algorithms Gene Expression Profiling / methods Transcriptome Interleukin-1 Receptor-Associated Kinases / genetics Male Female Lipocalin-2 / genetics Case-Control Studies Biomarkers Adult

来  源:   DOI:10.1038/s41598-024-65481-8   PDF(Pubmed)

Abstract:
Ulcerative colitis (UC) is a chronic inflammatory bowel disease with intricate pathogenesis and varied presentation. Accurate diagnostic tools are imperative to detect and manage UC. This study sought to construct a robust diagnostic model using gene expression profiles and to identify key genes that differentiate UC patients from healthy controls. Gene expression profiles from eight cohorts, encompassing a total of 335 UC patients and 129 healthy controls, were analyzed. A total of 7530 gene sets were computed using the GSEA method. Subsequent batch correction, PCA plots, and intersection analysis identified crucial pathways and genes. Machine learning, incorporating 101 algorithm combinations, was employed to develop diagnostic models. Verification was done using four external cohorts, adding depth to the sample repertoire. Evaluation of immune cell infiltration was undertaken through single-sample GSEA. All statistical analyses were conducted using R (Version: 4.2.2), with significance set at a P value below 0.05. Employing the GSEA method, 7530 gene sets were computed. From this, 19 intersecting pathways were discerned to be consistently upregulated across all cohorts, which pertained to cell adhesion, development, metabolism, immune response, and protein regulation. This corresponded to 83 unique genes. Machine learning insights culminated in the LASSO regression model, which outperformed others with an average AUC of 0.942. This model\'s efficacy was further ratified across four external cohorts, with AUC values ranging from 0.694 to 0.873 and significant Kappa statistics indicating its predictive accuracy. The LASSO logistic regression model highlighted 13 genes, with LCN2, ASS1, and IRAK3 emerging as pivotal. Notably, LCN2 showcased significantly heightened expression in active UC patients compared to both non-active patients and healthy controls (P < 0.05). Investigations into the correlation between these genes and immune cell infiltration in UC highlighted activated dendritic cells, with statistically significant positive correlations noted for LCN2 and IRAK3 across multiple datasets. Through comprehensive gene expression analysis and machine learning, a potent LASSO-based diagnostic model for UC was developed. Genes such as LCN2, ASS1, and IRAK3 hold potential as both diagnostic markers and therapeutic targets, offering a promising direction for future UC research and clinical application.
摘要:
溃疡性结肠炎(UC)是一种慢性炎症性肠病,发病机制复杂,表现多样。准确的诊断工具对于检测和管理UC至关重要。这项研究试图使用基因表达谱构建一个强大的诊断模型,并鉴定将UC患者与健康对照区分开来的关键基因。来自八个队列的基因表达谱,包括335名UC患者和129名健康对照,进行了分析。使用GSEA方法计算了总共7530个基因集。后续批次更正,PCA图,交叉分析确定了关键的途径和基因。机器学习,结合101种算法组合,被用来开发诊断模型。使用四个外部队列进行验证,增加样本库的深度。通过单样品GSEA进行免疫细胞浸润的评估。所有统计分析均使用R(版本:4.2.2)进行,显著性设置为P值低于0.05。采用GSEA方法,计算了7530个基因集。由此,在所有队列中,发现19个交叉途径一致上调,这与细胞粘附有关,发展,新陈代谢,免疫反应,和蛋白质调节。这对应于83个独特的基因。机器学习洞察在LASSO回归模型中达到顶峰,其表现优于其他人,平均AUC为0.942。该模型的疗效在四个外部队列中得到了进一步的认可,AUC值范围为0.694至0.873,显著的Kappa统计表明其预测准确性。LASSO逻辑回归模型突出了13个基因,LCN2、ASS1和IRAK3成为关键。值得注意的是,LCN2在活动性UC患者中的表达显著高于非活动性患者和健康对照(P<0.05)。研究这些基因与UC突出显示的活化树突状细胞中免疫细胞浸润之间的相关性,在多个数据集中,LCN2和IRAK3具有统计学显著的正相关。通过全面的基因表达分析和机器学习,建立了一种基于LASSO的UC有效诊断模型.LCN2、ASS1和IRAK3等基因具有作为诊断标志物和治疗靶标的潜力。为未来UC的研究和临床应用提供了有希望的方向。
公众号