关键词: Differential expression scDEA scHD4E scRNA-seq

来  源:   DOI:10.1016/j.compbiomed.2024.108769

Abstract:
Differential expression (DE) analysis between cell types for scRNA-seq data by capturing its complicated features is crucial. Recently, different methods have been developed for targeting the scRNA-seq data analysis based on different modeling frameworks, assumptions, strategies and test statistic in considering various data features. The scDEA is an ensemble learning-based DE analysis method developed recently, yielding p-values using Lancaster\'s combination, generated by 12 individual DE analysis methods, and producing more accurate and stable results than individual methods. The objective of our study is to propose a new ensemble learning-based DE analysis method, scHD4E, using top performers in only 4 separate methods. The top performer 4 methods have been selected through an evaluation process using six real scRNA-seq data sets. We conducted comprehensive experiments for five experimental data sets to evaluate our proposed method based on the sample size effects, batch effects, type I error control, gene ontology enrichment analysis, runtime, identified matched DE genes, and semantic similarity measurement between methods. We also perform similar analyses (except the last 3 terms) and compute performance measures like accuracy, F1 score, Mathew\'s correlation coefficient etc. for a simulated data set. The results show that scHD4E is performs better than all the individual and scDEA methods in all the above perspectives. We expect that scHD4E will serve the modern data scientists for detecting the DEGs in scRNA-seq data analysis. To implement our proposed method, a Github R package scHD4E and its shiny application has been developed, and available in the following links: https://github.com/bbiswas1989/scHD4E and https://github.com/bbiswas1989/scHD4E-Shiny.
摘要:
通过捕获scRNA-seq数据的复杂特征来分析细胞类型之间的差异表达(DE)至关重要。最近,已经开发了基于不同建模框架的针对scRNA-seq数据分析的不同方法,假设,考虑各种数据特征的策略和检验统计量。scDEA是最近发展起来的一种基于集成学习的DE分析方法,使用兰开斯特的组合产生p值,由12种单独的DE分析方法生成,并产生比单独方法更准确和稳定的结果。我们研究的目的是提出一种新的基于集成学习的DE分析方法,scHD4E,仅在4种不同的方法中使用最佳表演者。通过使用六个真实scRNA-seq数据集的评估过程,选择了最佳执行者4种方法。我们对五个实验数据集进行了全面的实验,以基于样本量效应评估我们提出的方法,批处理效果,I型错误控制,基因本体论富集分析,运行时,鉴定出匹配的DE基因,方法之间的语义相似性度量。我们还执行类似的分析(除了最后三个术语),并计算性能度量,如准确性,F1得分,马修相关系数等。用于模拟数据集。结果表明,在上述所有方面,scHD4E的性能都优于所有个体和scDEA方法。我们希望scHD4E将为现代数据科学家提供服务,以检测scRNA-seq数据分析中的DEG。为了实现我们提出的方法,已经开发了一个GithubR包scHD4E及其闪亮的应用程序,并在以下链接中提供:https://github.com/bbiswas1989/scHD4E和https://github.com/bbiswas1989/scHD4E-Shiny。
公众号