人类参考基因组是现代基因组分析的基础。然而,以目前的形式,它不能充分代表人类的巨大遗传多样性。在这项研究中,我们探索了作为当前参考基因组的潜在后继的共有基因组,并评估了其对RNA-seq读数比对准确性的影响.为了找到最好的单倍体基因组代表,我们构建了泛人类的共识基因组,超人口,和人口水平,使用来自1000基因组计划联盟的变异信息。使用个人单倍体基因组作为基本事实,我们比较了与共有基因组和参考基因组比对的真实RNA-seq读数的作图误差.对于读数重叠的纯合变体,我们发现,当参考被全人类共有基因组取代时,作图误差减少了约2-3倍.我们还发现,与使用泛人类共识相比,使用更多针对人群的共识几乎没有增加,这表明整合更具体的基因组变异的效用受到限制。用共识基因组代替参考会影响功能分析,例如同工型的差异表达,基因,和拼接点。
The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the
consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. To find the best haploid genome representation, we constructed
consensus genomes at the pan-human, superpopulation, and population levels, using variant information from The 1000 Genomes Project Consortium. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the
consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of approximately two to three when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase over using the pan-human
consensus, suggesting a limit in the utility of incorporating a more specific genomic variation. Replacing the reference with
consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions.