关键词: AUC, Area under the curve Bioinformatics Biomarkers COVID-19 COVID-19, Coronavirus Disease of 2019 DEG, Differentially expressed gene Data mining GEO, Gene Expression Omnibus GO, Gene Ontology RNA RNA-sequencing ROC, Receiver-operator characteristic SARS-CoV-2 SARS-CoV-2, Severe Acute Respiratory Syndrome Coronavirus 2 Virus

来  源:   DOI:10.1016/j.csbj.2023.02.003   PDF(Pubmed)

Abstract:
SARS-CoV-2 is the causative agent of COVID-19, which has greatly affected human health since it first emerged. Defining the human factors and biomarkers that differentiate severe SARS-CoV-2 infection from mild infection has become of increasing interest to clinicians. To help address this need, we retrieved 269 public RNA-seq human transcriptome samples from GEO that had qualitative disease severity metadata. We then subjected these samples to a robust RNA-seq data processing workflow to calculate gene expression in PBMCs, whole blood, and leukocytes, as well as to predict transcriptional biomarkers in PBMCs and leukocytes. This process involved using Salmon for read mapping, edgeR to calculate significant differential expression levels, and gene ontology enrichment using Camera. We then performed a random forest machine learning analysis on the read counts data to identify genes that best classified samples based on the COVID-19 severity phenotype. This approach produced a ranked list of leukocyte genes based on their Gini values that includes TGFBI, TTYH2, and CD4, which are associated with both the immune response and inflammation. Our results show that these three genes can potentially classify samples with severe COVID-19 with accuracy of ∼88% and an area under the receiver operating characteristic curve of 92.6--indicating acceptable specificity and sensitivity. We expect that our findings can help contribute to the development of improved diagnostics that may aid in identifying severe COVID-19 cases, guide clinical treatment, and improve mortality rates.
摘要:
SARS-CoV-2是COVID-19的病原体,自首次出现以来,对人类健康产生了极大的影响。定义区分严重SARS-CoV-2感染与轻度感染的人为因素和生物标志物已越来越受到临床医生的关注。为了帮助解决这一需求,我们从GEO检索了269个具有定性疾病严重程度元数据的公共RNA-seq人类转录组样本.然后,我们将这些样品进行强大的RNA-seq数据处理工作流程,以计算PBMC中的基因表达,全血,和白细胞,以及预测PBMC和白细胞中的转录生物标志物。此过程涉及使用Salmon进行读取映射,edgeR计算显著差异表达水平,和使用相机进行基因本体富集。然后,我们对读段计数数据进行了随机森林机器学习分析,以确定根据COVID-19严重性表型对样品进行最佳分类的基因。这种方法根据白细胞基因的Gini值产生了一个排名列表,其中包括TGFBI,TTYH2和CD4与免疫反应和炎症有关。我们的结果表明,这三个基因可以潜在地对患有严重COVID-19的样品进行分类,准确率为约88%,接受者工作特征曲线下面积为92.6-表明可接受的特异性和敏感性。我们希望我们的发现可以帮助开发改进的诊断方法,这可能有助于识别严重的COVID-19病例,指导临床治疗,提高死亡率。
公众号