Mesh : Algorithms Gene Expression Profiling / methods standards High-Throughput Nucleotide Sequencing / methods standards Humans Models, Genetic Normal Distribution Sensitivity and Specificity Sequence Analysis, RNA / methods standards

来  源:   DOI:10.1155/2015/621690   PDF(Pubmed)

Abstract:
High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably.
摘要:
高通量测序技术,例如IlluminaHi-seq,是调查各种生物和医学问题的强大新工具。测序仪产生的大量复杂数据集需要开发统计和计算方法,以解决数据的分析和管理。数据标准化是数据处理中最关键的步骤之一,必须仔细考虑此过程,因为它对分析结果具有深远的影响。在这项工作中,我们专注于与测序深度相关的五种归一化方法的综合比较,广泛用于转录组测序(RNA-seq)数据,以及它们对基因表达分析结果的影响。基于这项研究,我们建议一个通用的工作流程,可以应用于为任何特定的数据集选择最佳的标准化过程。所描述的工作流程包括计算对照基因的偏差和方差值,方法的敏感性和特异性,和分类错误以及诊断图的生成。组合上述信息有助于为所研究的数据集选择最合适的归一化方法,并确定哪些方法可以互换使用。
公众号