关键词: Confounding Curse of dimensionality EM algorithm RNAseq

Mesh : Humans Computer Simulation Genome Cluster Analysis Gene Expression Profiling / methods

来  源:   DOI:10.1186/s12859-023-05556-x   PDF(Pubmed)

Abstract:
The causes of many complex human diseases are still largely unknown. Genetics plays an important role in uncovering the molecular mechanisms of complex human diseases. A key step to characterize the genetics of a complex human disease is to unbiasedly identify disease-associated gene transcripts on a whole-genome scale. Confounding factors could cause false positives. Paired design, such as measuring gene expression before and after treatment for the same subject, can reduce the effect of known confounding factors. However, not all known confounding factors can be controlled in a paired/match design. Model-based clustering, such as mixtures of hierarchical models, has been proposed to detect gene transcripts differentially expressed between paired samples. To the best of our knowledge, no model-based gene clustering methods have the capacity to adjust for the effects of covariates yet. In this article, we proposed a novel mixture of hierarchical models with covariate adjustment in identifying differentially expressed transcripts using high-throughput whole-genome data from paired design. Both simulation study and real data analysis show the good performance of the proposed method.
摘要:
许多复杂的人类疾病的原因在很大程度上仍然未知。遗传学在揭示复杂人类疾病的分子机制中起着重要作用。表征复杂人类疾病的遗传学的关键步骤是在全基因组尺度上无偏差地鉴定疾病相关基因转录本。混杂因素可能导致假阳性。配对设计,例如测量同一受试者治疗前后的基因表达,可以减少已知混杂因素的影响。然而,并非所有已知的混杂因素都可以在配对/匹配设计中进行控制。基于模型的聚类,例如分层模型的混合,已提出检测配对样品之间差异表达的基因转录本。据我们所知,没有基于模型的基因聚类方法有能力调整协变量的影响。在这篇文章中,在使用配对设计的高通量全基因组数据鉴定差异表达的转录本时,我们提出了一种新的分层模型与协变量调整的混合.仿真研究和实际数据分析都表明了该方法的良好性能。
公众号