关键词: Bioinformatics Combo-genome Host-pathogen interactions Parallel alignment Plant-microbe interaction system RNA-seq Sequential alignment

Mesh : RNA-Seq Phylogeny Genome Computer Simulation Software

来  源:   DOI:10.1007/978-1-0716-3159-1_10

Abstract:
In RNA-seq data processing, short reads are usually aligned from one species against its own genome sequence; however, in plant-pathogen interaction systems, reads from both host and pathogen samples are blended together. In contrast with single-genome analyses, both pathogen and host reference genomes are involved in the alignment process. In such circumstances, the order in which the alignment is carried out, whether the host or pathogen is aligned first, or if both genomes are aligned simultaneously, influences the read counts of certain genes. This is a problem, especially at advanced infection stages. It is crucial to have an appropriate strategy for aligning the reads to their respective genomes, yet the existing strategies of either sequential or parallel alignment become problematic when mapping mixed reads to their corresponding reference genomes. The challenge lies in the determination of which reads belong to which species, especially when homology exists between the host and pathogen genomes. This chapter proposes a combo-genome alignment strategy, which was compared with existing alignment scenarios. Simulation results demonstrated that the degree of discrepancy in the results is correlated with phylogenetic distance of the two species in the mixture which was attributable to the extent of homology between the two genomes involved. This correlation was also found in the analysis using two real RNA-seq datasets of Fusarium-challenged wheat plants. Comparisons of the three RNA-seq processing strategies on three simulation datasets and two real Fusarium-infected wheat datasets showed that an alignment to a combo-genome, consisting of both host and pathogen genomes, improves mapping quality as compared to sequential alignment procedures.
摘要:
在RNA-seq数据处理中,短读段通常是从一个物种对其自身的基因组序列进行比对;然而,在植物-病原体相互作用系统中,将来自宿主和病原体样品的读数混合在一起。与单基因组分析相反,病原体和宿主参考基因组都参与比对过程。在这种情况下,对齐的顺序,宿主或病原体是否首先对齐,或者如果两个基因组同时对齐,影响某些基因的读取计数。这是个问题,尤其是在晚期感染阶段。至关重要的是,有一个适当的策略来将读段与它们各自的基因组对齐,然而,当将混合读段映射到其相应的参考基因组时,现有的顺序或平行比对策略会出现问题。挑战在于确定哪些读数属于哪个物种,特别是当宿主和病原体基因组之间存在同源性时。本章提出了一种组合基因组比对策略,与现有的对齐方案进行了比较。模拟结果表明,结果的差异程度与混合物中两个物种的系统发育距离有关,这归因于所涉及的两个基因组之间的同源性程度。在使用镰刀菌攻击的小麦植物的两个真实RNA-seq数据集的分析中也发现了这种相关性。三个模拟数据集和两个真实镰刀菌感染的小麦数据集的三种RNA-seq处理策略的比较表明,与组合基因组的比对,由宿主和病原体基因组组成,与顺序对齐程序相比,提高了映射质量。
公众号