genomic structural variation

基因组结构变异
  • 文章类型: Journal Article
    背景:复杂结构变体(SV)是涉及多个DNA片段的基因组重排。它们有助于人类多样性,并已被证明会导致孟德尔氏病。然而,我们分析复杂SV的能力非常有限。与删除和其他规范类型的SV相反,没有明确设计用于分析复杂SV的既定工具。
    结果:这里,我们描述了一种新的计算方法,我们专门设计用于对短阅读测序基因组中的复杂SVs进行基因分型。给定一个变体描述,我们的方法计算基因型特定的概率分布,用于观察具有广泛特性的对齐读段对.随后,这些分布可用于有效地确定在测序的基因组中观察到的任何一组比对读段对的最可能的基因型。此外,我们使用这些分布来计算给定变体的基因分型难度,它预测实现可靠调用所需的数据量。仔细的评估证实,我们的方法通过对模拟和真实数据进行可靠的基因型预测,优于其他基因型。在多达7829个人类基因组上,我们实现了与群体遗传假设和预期遗传模式的高度一致性。在模拟数据上,我们发现精确度与我们对基因分型难度的预测有很好的相关性。这与低记忆和时间要求一起使我们的方法非常适合应用于涉及小到非常大量的短阅读测序基因组的生物医学研究。
    方法:源代码可在https://github.com/kehrlab/Complex-SV-Genotyping获得。
    BACKGROUND: Complex structural variants (SVs) are genomic rearrangements that involve multiple segments of DNA. They contribute to human diversity and have been shown to cause Mendelian disease. Nevertheless, our abilities to analyse complex SVs are very limited. As opposed to deletions and other canonical types of SVs, there are no established tools that have explicitly been designed for analysing complex SVs.
    RESULTS: Here, we describe a new computational approach that we specifically designed for genotyping complex SVs in short-read sequenced genomes. Given a variant description, our approach computes genotype-specific probability distributions for observing aligned read pairs with a wide range of properties. Subsequently, these distributions can be used to efficiently determine the most likely genotype for any set of aligned read pairs observed in a sequenced genome. In addition, we use these distributions to compute a genotyping difficulty for a given variant, which predicts the amount of data needed to achieve a reliable call. Careful evaluation confirms that our approach outperforms other genotypers by making reliable genotype predictions across both simulated and real data. On up to 7829 human genomes, we achieve high concordance with population-genetic assumptions and expected inheritance patterns. On simulated data, we show that precision correlates well with our prediction of genotyping difficulty. This together with low memory and time requirements makes our approach well-suited for application in biomedical studies involving small to very large numbers of short-read sequenced genomes.
    METHODS: Source code is available at https://github.com/kehrlab/Complex-SV-Genotyping.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    结构变异(SVs)显著促进了人类基因组的多样性,在精准医学中发挥着至关重要的作用。尽管单分子长读数测序的进步为SV检测提供了突破性的资源,准确可靠地识别SV断点和序列仍然具有挑战性。我们介绍火山SV,一种创新的混合SV检测管道,利用参考基因组和局部从头组装产生分阶段的二倍体组装。火山SV使用分阶段的SNP和独特的k-mer相似性分析,实现精确的单倍型解析SV发现。VolcanoSV擅长构建包含SNP的综合遗传图谱,小型indel,和所有类型的SV,使其非常适合人类基因组学研究。我们广泛的实验表明,火山SV在检测插入和删除SV方面超越了最先进的基于组装的工具,表现出卓越的召回,精度,F1得分,和不同数据集的基因型准确性,包括低覆盖率(10倍)数据集。VolcanoSV在识别复杂SV方面优于基于装配的工具,包括易位,重复,和倒置,在模拟和真实癌症数据中。此外,VolcanoSV对各种评估参数具有鲁棒性,可以准确识别断点和SV序列。
    Structural variants (SVs) significantly contribute to human genome diversity and play a crucial role in precision medicine. Although advancements in single-molecule long-read sequencing offer a groundbreaking resource for SV detection, identifying SV breakpoints and sequences accurately and robustly remains challenging. We introduce VolcanoSV, an innovative hybrid SV detection pipeline that utilizes both a reference genome and local de novo assembly to generate a phased diploid assembly. VolcanoSV uses phased SNPs and unique k-mer similarity analysis, enabling precise haplotype-resolved SV discovery. VolcanoSV is adept at constructing comprehensive genetic maps encompassing SNPs, small indels, and all types of SVs, making it well-suited for human genomics studies. Our extensive experiments demonstrate that VolcanoSV surpasses state-of-the-art assembly-based tools in the detection of insertion and deletion SVs, exhibiting superior recall, precision, F1 scores, and genotype accuracy across a diverse range of datasets, including low-coverage (10x) datasets. VolcanoSV outperforms assembly-based tools in the identification of complex SVs, including translocations, duplications, and inversions, in both simulated and real cancer data. Moreover, VolcanoSV is robust to various evaluation parameters and accurately identifies breakpoints and SV sequences.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    非洲猪瘟(ASF)是中国家猪的一种快速致命的病毒性出血热。尽管在ASF爆发后,猪场的死亡率非常高,在这些农场发现了临床健康和抗体阳性的猪,这些猪很少有病毒检测。猪抵抗ASF病毒感染的能力可能受宿主遗传变异的调节。然而,家猪对ASF的抗性的遗传基础仍不清楚。我们使用全基因组重测序方法,在具有抗ASF(Xiang-R)和易感ASF(Xiang-S)表型的中国本土香猪中生成了一套全面的结构变异(SV)。总共确定了53,589个非冗余SV,香猪基因组中平均每个个体25656SV,包括插入,删除,反转和重复变化。Xiang-R组比Xiang-S组拥有更多的SV。使用每个SV基因座的重测序数据进行F统计(FST)以揭示两个种群之间的遗传差异。我们确定了2,414个群体分层的SV,并注释了1,152个Ensembl基因(包括986个蛋白质编码基因),其中1,326个SV可能会干扰Ensembl基因的结构和表达。这些蛋白质编码基因主要富集在Wnt,河马,和钙信号通路。还确定了与ASF病毒感染相关的其他重要途径,例如内吞作用,凋亡,病灶粘连,FcγR介导的吞噬作用,交界处,NOD样受体,PI3K-Akt,和c型凝集素受体信号通路。最后,我们确定了135个候选适应性基因,这些基因与166个SVs重叠,参与病毒进入和病毒-宿主细胞相互作用.作为选择性扫描信号检测到的一些群体分层SVs区域的事实为影响猪对ASF抗性的遗传变异提供了另一种支持。研究表明,SVs在香猪适应ASF感染的进化过程中起着重要作用。
    African swine fever (ASF) is a rapidly fatal viral haemorrhagic fever in Chinese domestic pigs. Although very high mortality is observed in pig farms after an ASF outbreak, clinically healthy and antibody-positive pigs are found in those farms, and viral detection is rare from these pigs. The ability of pigs to resist ASF viral infection may be modulated by host genetic variations. However, the genetic basis of the resistance of domestic pigs against ASF remains unclear. We generated a comprehensive set of structural variations (SVs) in a Chinese indigenous Xiang pig with ASF-resistant (Xiang-R) and ASF-susceptible (Xiang-S) phenotypes using whole-genome resequencing method. A total of 53,589 nonredundant SVs were identified, with an average of 25,656 SVs per individual in the Xiang pig genome, including insertion, deletion, inversion and duplication variations. The Xiang-R group harboured more SVs than the Xiang-S group. The F-statistics (FST) was carried out to reveal genetic differences between two populations using the resequencing data at each SV locus. We identified 2,414 population-stratified SVs and annotated 1,152 Ensembl genes (including 986 protein-coding genes), in which 1,326 SVs might disturb the structure and expression of the Ensembl genes. Those protein-coding genes were mainly enriched in the Wnt, Hippo, and calcium signalling pathways. Other important pathways associated with the ASF viral infection were also identified, such as the endocytosis, apoptosis, focal adhesion, Fc gamma R-mediated phagocytosis, junction, NOD-like receptor, PI3K-Akt, and c-type lectin receptor signalling pathways. Finally, we identified 135 candidate adaptive genes overlapping 166 SVs that were involved in the virus entry and virus-host cell interactions. The fact that some of population-stratified SVs regions detected as selective sweep signals gave another support for the genetic variations affecting pig resistance against ASF. The research indicates that SVs play an important role in the evolutionary processes of Xiang pig adaptation to ASF infection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基因组数据中结构变体(SV)的鉴定代表了持续的挑战,因为可靠的SV调用中的困难导致灵敏度和特异性降低。我们从9个亲子三重奏中制备了高质量的DNA,作为基因组英格兰100,000基因组项目的一部分,他以前接受了短阅读全基因组测序(Illumina平台)。我们使用Bionano光学基因组作图(OGM;8个先证者和一个三人组)和Nanopore长读测序(OxfordNanoporeTechnologies[ONT]平台;所有样品)重新分析了基因组。要建立“真相”数据集,我们询问了由BionanoAccess(1.6.1版)/Solve软件(3.6.1_11162020版)进行的罕见先证者SV调用(n=234)是否可以使用具有Illumina和ONT原始序列之一或两者的IntegrativeGenomicsViewer通过个体可视化进行验证。其中,222个电话被确认,表明BionanoOGM调用具有很高的精度(阳性预测值95%)。然后,我们询问了在其他两个数据集中,SV呼叫者识别出222个真正的BionanoSV的比例。在Illumina数据集中,灵敏度根据变体类型而变化,缺失高(115/134;86%),但插入差(13/58;22%)。在ONT数据集中,使用原始Sniffles变体调用器的灵敏度通常较差(总体为48%),但使用Sniffles2后有了很大提高(36/40;90%和17/23;74%的缺失和插入,分别)。总之,我们表明OGM的精度非常高。此外,应用Sniffles2调用者时,对于大多数SV类型,使用ONT长读序列数据进行SV调用的灵敏度优于Illumina测序.
    The identification of structural variants (SVs) in genomic data represents an ongoing challenge because of difficulties in reliable SV calling leading to reduced sensitivity and specificity. We prepared high-quality DNA from 9 parent-child trios, who had previously undergone short-read whole-genome sequencing (Illumina platform) as part of the Genomics England 100,000 Genomes Project. We reanalysed the genomes using both Bionano optical genome mapping (OGM; 8 probands and one trio) and Nanopore long-read sequencing (Oxford Nanopore Technologies [ONT] platform; all samples). To establish a \"truth\" dataset, we asked whether rare proband SV calls (n = 234) made by the Bionano Access (version 1.6.1)/Solve software (version 3.6.1_11162020) could be verified by individual visualisation using the Integrative Genomics Viewer with either or both of the Illumina and ONT raw sequence. Of these, 222 calls were verified, indicating that Bionano OGM calls have high precision (positive predictive value 95%). We then asked what proportion of the 222 true Bionano SVs had been identified by SV callers in the other two datasets. In the Illumina dataset, sensitivity varied according to variant type, being high for deletions (115/134; 86%) but poor for insertions (13/58; 22%). In the ONT dataset, sensitivity was generally poor using the original Sniffles variant caller (48% overall) but improved substantially with use of Sniffles2 (36/40; 90% and 17/23; 74% for deletions and insertions, respectively). In summary, we show that the precision of OGM is very high. In addition, when applying the Sniffles2 caller, the sensitivity of SV calling using ONT long-read sequence data outperforms Illumina sequencing for most SV types.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基因错误表达是在通常无活性的情况下基因的异常转录。尽管已知其在特定罕见疾病中的病理后果,我们对其在人类中的广泛流行和机制了解有限。为了解决这个问题,我们分析了来自INTERVAL研究献血者的4,568份全血批量RNA测序样本中的基因错误表达.我们发现,虽然个别错误表达事件很少发生,总的来说,它们存在于几乎所有的样本和三分之一的无活性蛋白质编码基因中。使用2,821个配对的全基因组和RNA测序样本,我们发现,对于罕见的结构变异体,错误表达事件富含顺式.我们建立了推测的机制,通过这些机制,SVs的一个子集导致基因错误表达,包括转录通读,转录融合,和基因倒位。总的来说,我们开发了错误表达作为转录组异常分析的一种类型,并扩展了我们对遗传变异影响基因表达的各种机制的理解。
    Gene misexpression is the aberrant transcription of a gene in a context where it is usually inactive. Despite its known pathological consequences in specific rare diseases, we have a limited understanding of its wider prevalence and mechanisms in humans. To address this, we analyzed gene misexpression in 4,568 whole-blood bulk RNA sequencing samples from INTERVAL study blood donors. We found that while individual misexpression events occur rarely, in aggregate they were found in almost all samples and a third of inactive protein-coding genes. Using 2,821 paired whole-genome and RNA sequencing samples, we identified that misexpression events are enriched in cis for rare structural variants. We established putative mechanisms through which a subset of SVs lead to gene misexpression, including transcriptional readthrough, transcript fusions, and gene inversion. Overall, we develop misexpression as a type of transcriptomic outlier analysis and extend our understanding of the variety of mechanisms by which genetic variants can influence gene expression.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:使用第三代测序数据的结构变异(SV)检测方法被广泛采用,然而准确检测SV仍然具有挑战性。对于某些SV类型,不同的方法通常会产生不一致的结果,复杂的工具选择和揭示偏见的检测。
    结果:本研究使用来自PacBio的模拟和真实数据(CLR:连续长读,CCS:循环共有测序)和纳米孔(ONT)平台。我们评估它们在检测各种大小和类型的SV时的性能,断点偏差,和不同测序深度的基因分型准确性。值得注意的是,管道,如Minimap2-cuteSV2、NGMLR-SVIM、PBMM2-pbsv,Winnowmap-Sniffles2和Winnowmap-SVision表现出相对较高的召回率和精确度。我们的发现还表明,将多个管道与相同的对准器组合在一起,像pbmm2或winnowmap,可以显著提高性能。可以在动态表中查看各个管道的详细排名和性能指标:http://pmglab。top/SVPipelinesRanking。
    结论:这项研究全面描述了众多管道的优缺点,提供有价值的见解,可以改善第三代测序数据中的SV检测,并为SV注释和功能预测提供信息。
    BACKGROUND: Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection.
    RESULTS: This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines\' detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking .
    CONCLUSIONS: This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    结构变异(SV)是基因组变异的一种重要形式,它通过改变基因组的结构来影响基因功能和表达。尽管长读数据已被证明可以更好地表征SV,从嘈杂的长读数据中检测到的SV仍然包括相当一部分的误报呼叫。为了准确检测长读取数据中的SV,我们介绍SVDF,一种采用基于学习的噪声过滤策略和SV签名自适应聚类算法的方法,有效降低假阳性事件的可能性。多个正交实验的基准测试结果表明,跨越不同的测序平台和深度,与几种现有的通用SV调用工具相比,SVDF为每个样本实现了更高的调用精度。我们相信,凭借其细致而灵敏的SV检测能力,SVDF可以为前沿基因组研究带来新的机遇和进步。
    Structural variation (SV) is an important form of genomic variation that influences gene function and expression by altering the structure of the genome. Although long-read data have been proven to better characterize SVs, SVs detected from noisy long-read data still include a considerable portion of false-positive calls. To accurately detect SVs in long-read data, we present SVDF, a method that employs a learning-based noise filtering strategy and an SV signature-adaptive clustering algorithm, for effectively reducing the likelihood of false-positive events. Benchmarking results from multiple orthogonal experiments demonstrate that, across different sequencing platforms and depths, SVDF achieves higher calling accuracy for each sample compared to several existing general SV calling tools. We believe that, with its meticulous and sensitive SV detection capability, SVDF can bring new opportunities and advancements to cutting-edge genomic research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    多倍体,全基因组复制(WGD)的结果,是真核生物进化的主要驱动力。然而,WGD是巨大的破坏性突变,我们仍然对他们的健身后果缺乏清晰的了解。这里,我们研究WGD是否会导致基因组结构变异(SV)的更大多样性,以及它们如何影响植物属的进化动力学,耳蜗(十字花科)。通过使用长读数测序和基于图形的pangenome,我们发现WGD和SV之间既有消极的相互作用,也有积极的相互作用。由于WGD引起的隐性突变的掩盖导致有害SV在四个倍性水平(从二倍体到八倍体)的逐步积累,可能会降低多倍体种群的适应性潜力。然而,我们还发现了SV积累带来的推定益处,与二倍体相比,多倍体特异性SVs具有更多的局部适应信号。一起,我们的结果表明,SVs在年轻多倍体的进化轨迹中起着不同的作用。
    Polyploidy, the result of whole-genome duplication (WGD), is a major driver of eukaryote evolution. Yet WGDs are hugely disruptive mutations, and we still lack a clear understanding of their fitness consequences. Here, we study whether WGDs result in greater diversity of genomic structural variants (SVs) and how they influence evolutionary dynamics in a plant genus, Cochlearia (Brassicaceae). By using long-read sequencing and a graph-based pangenome, we find both negative and positive interactions between WGDs and SVs. Masking of recessive mutations due to WGDs leads to a progressive accumulation of deleterious SVs across four ploidal levels (from diploids to octoploids), likely reducing the adaptive potential of polyploid populations. However, we also discover putative benefits arising from SV accumulation, as more ploidy-specific SVs harbor signals of local adaptation in polyploids than in diploids. Together, our results suggest that SVs play diverse and contrasting roles in the evolutionary trajectories of young polyploids.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    藏羊被引种到青藏高原大约公元前3000年,使该物种成为在相对较短的时间内研究高海拔适应的遗传机制的良好模型。这里,我们描述了基因组结构变异(SVs),将藏羊与紧密相关,低海拔湖羊,我们检查了组织特异性基因表达的相关变化。我们记录了两个绵羊品种之间与心脏功能和循环相关基因相关的SV频率的差异。在西藏羊,我们在总共462个基因中鉴定了高频SVs,包括EPAS1、PAPSS2和PTPRD。单细胞RNA-Seq数据和荧光素酶报告基因分析显示,SVs对这三种基因在特定组织和细胞类型中的表达水平具有顺式作用。在西藏羊,我们发现了一个高频染色体倒位,相对于在湖羊中占主导地位的非倒位等位基因,该染色体倒位表现出改变的染色质结构.倒置包含几个与心脏保护相关的表达模式改变的基因,棕色脂肪细胞增殖,血管生成,DNA修复这些发现表明,SVs是基因表达遗传变异的重要来源,可能有助于藏绵羊的高海拔适应。
    Tibetan sheep were introduced to the Qinghai Tibet plateau roughly 3,000 B.P., making this species a good model for investigating genetic mechanisms of high-altitude adaptation over a relatively short timescale. Here, we characterize genomic structural variants (SVs) that distinguish Tibetan sheep from closely related, low-altitude Hu sheep, and we examine associated changes in tissue-specific gene expression. We document differentiation between the two sheep breeds in frequencies of SVs associated with genes involved in cardiac function and circulation. In Tibetan sheep, we identified high-frequency SVs in a total of 462 genes, including EPAS1, PAPSS2, and PTPRD. Single-cell RNA-Seq data and luciferase reporter assays revealed that the SVs had cis-acting effects on the expression levels of these three genes in specific tissues and cell types. In Tibetan sheep, we identified a high-frequency chromosomal inversion that exhibited modified chromatin architectures relative to the noninverted allele that predominates in Hu sheep. The inversion harbors several genes with altered expression patterns related to heart protection, brown adipocyte proliferation, angiogenesis, and DNA repair. These findings indicate that SVs represent an important source of genetic variation in gene expression and may have contributed to high-altitude adaptation in Tibetan sheep.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    重复三重复/反向重复(DUP-TRP/INV-DUP)结构是复杂的基因组重排(CGR)。尽管它已被确定为基因组疾病和癌症基因组中重要的致病性DNA突变特征,其架构仍未解决。这里,我们通过调查通过阵列比较基因组杂交(aCGH)鉴定的24例患者的DNA,研究了DUP-TRP/INV-DUP的基因组结构,我们在这些患者身上发现了4种预测结构变异(SV)单倍型中存在4种的证据.使用短阅读基因组测序(GS)的组合,长读GS,光学基因组作图,和单细胞DNA模板链测序(strand-seq),在18个样本中解析了单倍型结构.4个样品中的模板转换点显示为反向重复序列对中100%核苷酸相似性的~2.2-5.5kb的片段。这些数据提供了反向低拷贝重复作为重组底物的实验证据。这种类型的CGR可以导致在易感剂量敏感基因座中产生多种SV单倍型的多个构象。
    The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a complex genomic rearrangement (CGR). Although it has been identified as an important pathogenic DNA mutation signature in genomic disorders and cancer genomes, its architecture remains unresolved. Here, we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the DNA of 24 patients identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted structural variant (SV) haplotypes. Using a combination of short-read genome sequencing (GS), long-read GS, optical genome mapping, and single-cell DNA template strand sequencing (strand-seq), the haplotype structure was resolved in 18 samples. The point of template switching in 4 samples was shown to be a segment of ∼2.2-5.5 kb of 100% nucleotide similarity within inverted repeat pairs. These data provide experimental evidence that inverted low-copy repeats act as recombinant substrates. This type of CGR can result in multiple conformers generating diverse SV haplotypes in susceptible dosage-sensitive loci.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号