关键词: Complex genome Domestication Durum wheat FAIR principles Genotyping Target enrichment capture Workflows

来  源:   DOI:10.1186/s13007-024-01210-6   PDF(Pubmed)

Abstract:
BACKGROUND: Genotyping of individuals plays a pivotal role in various biological analyses, with technology choice influenced by multiple factors including genomic constraints, number of targeted loci and individuals, cost considerations, and the ease of sample preparation and data processing. Target enrichment capture of specific polymorphic regions has emerged as a flexible and cost-effective genomic reduction method for genotyping, especially adapted to the case of very large genomes. However, this approach necessitates complex bioinformatics treatment to extract genotyping data from raw reads. Existing workflows predominantly cater to phylogenetic inference, leaving a gap in user-friendly tools for genotyping analysis based on capture methods. In response to these challenges, we have developed GeCKO (Genotyping Complexity Knocked-Out). To assess the effectiveness of combining target enrichment capture with GeCKO, we conducted a case study on durum wheat domestication history, involving sequencing, processing, and analyzing variants in four relevant durum wheat groups.
RESULTS: GeCKO encompasses four distinct workflows, each designed for specific steps of genomic data processing: (i) read demultiplexing and trimming for data cleaning, (ii) read mapping to align sequences to a reference genome, (iii) variant calling to identify genetic variants, and (iv) variant filtering. Each workflow in GeCKO can be easily configured and is executable across diverse computational environments. The workflows generate comprehensive HTML reports including key summary statistics and illustrative graphs, ensuring traceable, reproducible results and facilitating straightforward quality assessment. A specific innovation within GeCKO is its \'targeted remapping\' feature, specifically designed for efficient treatment of targeted enrichment capture data. This process consists of extracting reads mapped to the targeted regions, constructing a smaller sub-reference genome, and remapping the reads to this sub-reference, thereby enhancing the efficiency of subsequent steps.
CONCLUSIONS: The case study results showed the expected intra-group diversity and inter-group differentiation levels, confirming the method\'s effectiveness for genotyping and analyzing genetic diversity in species with complex genomes. GeCKO streamlined the data processing, significantly improving computational performance and efficiency. The targeted remapping enabled straightforward SNP calling in durum wheat, a task otherwise complicated by the species\' large genome size. This illustrates its potential applications in various biological research contexts.
摘要:
背景:个体的基因分型在各种生物学分析中起着关键作用,技术选择受包括基因组限制在内的多种因素影响,目标基因座和个体的数量,成本考虑,以及样品制备和数据处理的简便性。特定多态性区域的靶标富集捕获已成为一种灵活且具有成本效益的基因分型基因组减少方法,特别适合于非常大的基因组。然而,这种方法需要复杂的生物信息学处理,以从原始读数中提取基因分型数据。现有的工作流程主要迎合系统发育推断,在基于捕获方法的基因分型分析的用户友好工具方面留下了空白。为了应对这些挑战,我们已经开发了GeCKO(基因分型复杂性敲除)。为了评估将靶标富集捕获与GeCKO相结合的有效性,我们对硬粒小麦驯化史进行了案例研究,涉及测序,processing,并分析了四个相关硬粒小麦组的变异。
结果:GeCKO包含四个不同的工作流程,每个设计用于基因组数据处理的特定步骤:(i)读取解复用和修剪以进行数据清理,(ii)读段作图以将序列与参考基因组进行比对,(iii)识别遗传变异的变异呼叫,和(Iv)变体过滤。GeCKO中的每个工作流程都可以轻松配置,并且可以在各种计算环境中执行。工作流生成全面的HTML报告,包括关键摘要统计信息和说明性图表,确保可追溯,可重复的结果和促进直接的质量评估。GeCKO中的一项特定创新是其“目标重映射”功能,专门设计用于有效处理靶向富集捕获数据。这个过程包括提取映射到目标区域的读段,构建一个更小的子参考基因组,并将读取重新映射到此子引用,从而提高后续步骤的效率。
结论:案例研究结果显示预期的组内多样性和组间分化水平,证实了该方法对具有复杂基因组的物种进行基因分型和分析遗传多样性的有效性。GeCKO简化了数据处理,显著提高计算性能和效率。有针对性的重新映射使硬粒小麦中的SNP调用变得简单,这是一项因物种大基因组大小而变得复杂的任务。这说明了其在各种生物学研究环境中的潜在应用。
公众号