关键词: AUC RNA-seq differential expression jaeckel’s estimator normalization α trimmed mean

Mesh : RNA-Seq / methods Humans Algorithms Sequence Analysis, RNA / methods Computational Biology / methods Gene Expression Profiling / methods ROC Curve Software

来  源:   DOI:10.1093/bib/bbae241   PDF(Pubmed)

Abstract:
The normalization of RNA sequencing data is a primary step for downstream analysis. The most popular method used for the normalization is the trimmed mean of M values (TMM) and DESeq. The TMM tries to trim away extreme log fold changes of the data to normalize the raw read counts based on the remaining non-deferentially expressed genes. However, the major problem with the TMM is that the values of trimming factor M are heuristic. This paper tries to estimate the adaptive value of M in TMM based on Jaeckel\'s Estimator, and each sample acts as a reference to find the scale factor of each sample. The presented approach is validated on SEQC, MAQC2, MAQC3, PICKRELL and two simulated datasets with two-group and three-group conditions by varying the percentage of differential expression and the number of replicates. The performance of the present approach is compared with various state-of-the-art methods, and it is better in terms of area under the receiver operating characteristic curve and differential expression.
摘要:
RNA测序数据的标准化是下游分析的主要步骤。用于归一化的最流行的方法是M值的修整平均值(TMM)和DESeq。TMM试图修剪掉数据的极端对数倍数变化,以基于剩余的非顺化表达的基因标准化原始读段计数。然而,TMM的主要问题是修剪因子M的值是启发式的。本文尝试基于Jaeckel估计器估计TMM中M的自适应值,每个样本作为参考,找到每个样本的比例因子。所提出的方法在SEQC上进行了验证,MAQC2,MAQC3,PICKRELL和两个具有两组和三组条件的模拟数据集,通过改变差异表达的百分比和重复次数。将本方法的性能与各种最先进的方法进行比较,在接收器工作特性曲线下面积和差异表达方面更好。
公众号