genomic structural variation

基因组结构变异
  • 文章类型: Journal Article
    背景:复杂结构变体(SV)是涉及多个DNA片段的基因组重排。它们有助于人类多样性,并已被证明会导致孟德尔氏病。然而,我们分析复杂SV的能力非常有限。与删除和其他规范类型的SV相反,没有明确设计用于分析复杂SV的既定工具。
    结果:这里,我们描述了一种新的计算方法,我们专门设计用于对短阅读测序基因组中的复杂SVs进行基因分型。给定一个变体描述,我们的方法计算基因型特定的概率分布,用于观察具有广泛特性的对齐读段对.随后,这些分布可用于有效地确定在测序的基因组中观察到的任何一组比对读段对的最可能的基因型。此外,我们使用这些分布来计算给定变体的基因分型难度,它预测实现可靠调用所需的数据量。仔细的评估证实,我们的方法通过对模拟和真实数据进行可靠的基因型预测,优于其他基因型。在多达7829个人类基因组上,我们实现了与群体遗传假设和预期遗传模式的高度一致性。在模拟数据上,我们发现精确度与我们对基因分型难度的预测有很好的相关性。这与低记忆和时间要求一起使我们的方法非常适合应用于涉及小到非常大量的短阅读测序基因组的生物医学研究。
    方法:源代码可在https://github.com/kehrlab/Complex-SV-Genotyping获得。
    BACKGROUND: Complex structural variants (SVs) are genomic rearrangements that involve multiple segments of DNA. They contribute to human diversity and have been shown to cause Mendelian disease. Nevertheless, our abilities to analyse complex SVs are very limited. As opposed to deletions and other canonical types of SVs, there are no established tools that have explicitly been designed for analysing complex SVs.
    RESULTS: Here, we describe a new computational approach that we specifically designed for genotyping complex SVs in short-read sequenced genomes. Given a variant description, our approach computes genotype-specific probability distributions for observing aligned read pairs with a wide range of properties. Subsequently, these distributions can be used to efficiently determine the most likely genotype for any set of aligned read pairs observed in a sequenced genome. In addition, we use these distributions to compute a genotyping difficulty for a given variant, which predicts the amount of data needed to achieve a reliable call. Careful evaluation confirms that our approach outperforms other genotypers by making reliable genotype predictions across both simulated and real data. On up to 7829 human genomes, we achieve high concordance with population-genetic assumptions and expected inheritance patterns. On simulated data, we show that precision correlates well with our prediction of genotyping difficulty. This together with low memory and time requirements makes our approach well-suited for application in biomedical studies involving small to very large numbers of short-read sequenced genomes.
    METHODS: Source code is available at https://github.com/kehrlab/Complex-SV-Genotyping.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    结构变异(SVs)显著促进了人类基因组的多样性,在精准医学中发挥着至关重要的作用。尽管单分子长读数测序的进步为SV检测提供了突破性的资源,准确可靠地识别SV断点和序列仍然具有挑战性。我们介绍火山SV,一种创新的混合SV检测管道,利用参考基因组和局部从头组装产生分阶段的二倍体组装。火山SV使用分阶段的SNP和独特的k-mer相似性分析,实现精确的单倍型解析SV发现。VolcanoSV擅长构建包含SNP的综合遗传图谱,小型indel,和所有类型的SV,使其非常适合人类基因组学研究。我们广泛的实验表明,火山SV在检测插入和删除SV方面超越了最先进的基于组装的工具,表现出卓越的召回,精度,F1得分,和不同数据集的基因型准确性,包括低覆盖率(10倍)数据集。VolcanoSV在识别复杂SV方面优于基于装配的工具,包括易位,重复,和倒置,在模拟和真实癌症数据中。此外,VolcanoSV对各种评估参数具有鲁棒性,可以准确识别断点和SV序列。
    Structural variants (SVs) significantly contribute to human genome diversity and play a crucial role in precision medicine. Although advancements in single-molecule long-read sequencing offer a groundbreaking resource for SV detection, identifying SV breakpoints and sequences accurately and robustly remains challenging. We introduce VolcanoSV, an innovative hybrid SV detection pipeline that utilizes both a reference genome and local de novo assembly to generate a phased diploid assembly. VolcanoSV uses phased SNPs and unique k-mer similarity analysis, enabling precise haplotype-resolved SV discovery. VolcanoSV is adept at constructing comprehensive genetic maps encompassing SNPs, small indels, and all types of SVs, making it well-suited for human genomics studies. Our extensive experiments demonstrate that VolcanoSV surpasses state-of-the-art assembly-based tools in the detection of insertion and deletion SVs, exhibiting superior recall, precision, F1 scores, and genotype accuracy across a diverse range of datasets, including low-coverage (10x) datasets. VolcanoSV outperforms assembly-based tools in the identification of complex SVs, including translocations, duplications, and inversions, in both simulated and real cancer data. Moreover, VolcanoSV is robust to various evaluation parameters and accurately identifies breakpoints and SV sequences.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    测序技术的进步使不同灵长类物种的高质量基因组能够进行比较,揭示了由于结构变化造成的巨大差异。鉴于它们的大尺寸,结构变体(SV)可以同时改变多个基因的功能和调节。研究估计,与其他类人猿相比,人类基因组的总体上超过3.5%是不同的,影响成千上万的基因。各种模型系统中的功能基因组学和基因编辑工具最近成为一个令人兴奋的前沿领域-调查SV对分子,细胞,和系统级表型。这篇综述研究了现有的研究,并确定了未来的方向,以扩大我们对SVs对表型创新和多样性影响人类独特特征的功能作用的理解。从认知到代谢适应。
    Advances in sequencing technologies have enabled the comparison of high-quality genomes of diverse primate species, revealing vast amounts of divergence due to structural variation. Given their large size, structural variants (SVs) can simultaneously alter the function and regulation of multiple genes. Studies estimate that collectively more than 3.5% of the genome is divergent in humans versus other great apes, impacting thousands of genes. Functional genomics and gene-editing tools in various model systems recently emerged as an exciting frontier - investigating the wide-ranging impacts of SVs on molecular, cellular, and systems-level phenotypes. This review examines existing research and identifies future directions to broaden our understanding of the functional roles of SVs on phenotypic innovations and diversity impacting uniquely human features, ranging from cognition to metabolic adaptations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Case Reports
    UNASSIGNED: Pompe disease (PD) is a rare autosomal recessive genetic disorder (1 in 14,000) which affects the synthesis of acid alpha-glucosidase (AGA), leading to intralysosomal glycogen accumulation in muscle tissue. The clinical presentation is heterogeneous, with variable degrees of involvement and progression, classifiable based on the age of onset into infantile (classic or non-classic) and late-onset forms (juvenile or adult). The diagnostic test of choice is the enzymatic analysis of AGA, and the only pharmacological treatment is enzyme replacement therapy (ERT). This document aims to report a clinical case of late-onset PD.
    UNASSIGNED: 14-year-old male who started at the age of 5 with postural alterations, gait changes, and decreased physical performance compared to his peers. A diagnostic evaluation was initiated in 2022 due to worsening neuromuscular symptoms, accompanied by dyspnea, tachycardia, and chest pain. A suspicion of a lysosomal storage myopathy was established, and through enzymatic determination of AGA the diagnosis of PD was confirmed. The study of the GAA gene revealed the association of 2 previously unreported genomic variants. ERT was initiated, resulting in clinical improvement.
    UNASSIGNED: The age of symptom onset, severity of clinical presentation, and prognosis of the disease depend on the specific mutations involved. In this case, the identified genetic alterations are associated with different phenotypes. However, based on the clinical presentation, it is categorized as juvenile PD with an indeterminate prognosis.
    UNASSIGNED: la enfermedad de Pompe (EP) es un padecimiento genético autosómico recesivo poco frecuente (1:14,000) que afecta la síntesis de alfa-glucosidasa ácida (AGA) y condiciona un depósito de glucógeno intralisosomal en tejido muscular. La presentación clínica es heterogénea, con grados variables de afectación y progresión, clasificable según la edad de aparición en infantil (clásica y no clásica) y de inicio tardío (juvenil o de adultez). La prueba diagnóstica de elección es el análisis enzimático de AGA y el único tratamiento farmacológico es la terapia de reemplazo enzimático (TRE). Este documento tiene como objetivo reportar un caso clínico de EP de inicio tardío.
    UNASSIGNED: paciente de sexo masculino de 14 años que comenzó a los 5 años con alteraciones de la postura, marcha y desempeño físico. Se inició protocolo de estudio ante agravamiento de los síntomas neuromusculares, a los que se agregaron disnea, taquicardia y dolor torácico. Se sospechó de una miopatía metabólica de depósito lisosomal y mediante determinación enzimática de AGA se confirmó el diagnóstico de EP. El estudio molecular del gen GAA reportó una asociación de 2 variantes genómicas no descritas previamente. Se empleó la TRE con mejoría clínica.
    UNASSIGNED: la edad de inicio del cuadro clínico, severidad y pronóstico dependen de las mutaciones presentadas. En este caso, las alteraciones genéticas encontradas están relacionadas con diferentes fenotipos; no obstante, por clínica es categorizado como una EP juvenil con pronóstico indeterminado.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    非洲猪瘟(ASF)是中国家猪的一种快速致命的病毒性出血热。尽管在ASF爆发后,猪场的死亡率非常高,在这些农场发现了临床健康和抗体阳性的猪,这些猪很少有病毒检测。猪抵抗ASF病毒感染的能力可能受宿主遗传变异的调节。然而,家猪对ASF的抗性的遗传基础仍不清楚。我们使用全基因组重测序方法,在具有抗ASF(Xiang-R)和易感ASF(Xiang-S)表型的中国本土香猪中生成了一套全面的结构变异(SV)。总共确定了53,589个非冗余SV,香猪基因组中平均每个个体25656SV,包括插入,删除,反转和重复变化。Xiang-R组比Xiang-S组拥有更多的SV。使用每个SV基因座的重测序数据进行F统计(FST)以揭示两个种群之间的遗传差异。我们确定了2,414个群体分层的SV,并注释了1,152个Ensembl基因(包括986个蛋白质编码基因),其中1,326个SV可能会干扰Ensembl基因的结构和表达。这些蛋白质编码基因主要富集在Wnt,河马,和钙信号通路。还确定了与ASF病毒感染相关的其他重要途径,例如内吞作用,凋亡,病灶粘连,FcγR介导的吞噬作用,交界处,NOD样受体,PI3K-Akt,和c型凝集素受体信号通路。最后,我们确定了135个候选适应性基因,这些基因与166个SVs重叠,参与病毒进入和病毒-宿主细胞相互作用.作为选择性扫描信号检测到的一些群体分层SVs区域的事实为影响猪对ASF抗性的遗传变异提供了另一种支持。研究表明,SVs在香猪适应ASF感染的进化过程中起着重要作用。
    African swine fever (ASF) is a rapidly fatal viral haemorrhagic fever in Chinese domestic pigs. Although very high mortality is observed in pig farms after an ASF outbreak, clinically healthy and antibody-positive pigs are found in those farms, and viral detection is rare from these pigs. The ability of pigs to resist ASF viral infection may be modulated by host genetic variations. However, the genetic basis of the resistance of domestic pigs against ASF remains unclear. We generated a comprehensive set of structural variations (SVs) in a Chinese indigenous Xiang pig with ASF-resistant (Xiang-R) and ASF-susceptible (Xiang-S) phenotypes using whole-genome resequencing method. A total of 53,589 nonredundant SVs were identified, with an average of 25,656 SVs per individual in the Xiang pig genome, including insertion, deletion, inversion and duplication variations. The Xiang-R group harboured more SVs than the Xiang-S group. The F-statistics (FST) was carried out to reveal genetic differences between two populations using the resequencing data at each SV locus. We identified 2,414 population-stratified SVs and annotated 1,152 Ensembl genes (including 986 protein-coding genes), in which 1,326 SVs might disturb the structure and expression of the Ensembl genes. Those protein-coding genes were mainly enriched in the Wnt, Hippo, and calcium signalling pathways. Other important pathways associated with the ASF viral infection were also identified, such as the endocytosis, apoptosis, focal adhesion, Fc gamma R-mediated phagocytosis, junction, NOD-like receptor, PI3K-Akt, and c-type lectin receptor signalling pathways. Finally, we identified 135 candidate adaptive genes overlapping 166 SVs that were involved in the virus entry and virus-host cell interactions. The fact that some of population-stratified SVs regions detected as selective sweep signals gave another support for the genetic variations affecting pig resistance against ASF. The research indicates that SVs play an important role in the evolutionary processes of Xiang pig adaptation to ASF infection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基因组数据中结构变体(SV)的鉴定代表了持续的挑战,因为可靠的SV调用中的困难导致灵敏度和特异性降低。我们从9个亲子三重奏中制备了高质量的DNA,作为基因组英格兰100,000基因组项目的一部分,他以前接受了短阅读全基因组测序(Illumina平台)。我们使用Bionano光学基因组作图(OGM;8个先证者和一个三人组)和Nanopore长读测序(OxfordNanoporeTechnologies[ONT]平台;所有样品)重新分析了基因组。要建立“真相”数据集,我们询问了由BionanoAccess(1.6.1版)/Solve软件(3.6.1_11162020版)进行的罕见先证者SV调用(n=234)是否可以使用具有Illumina和ONT原始序列之一或两者的IntegrativeGenomicsViewer通过个体可视化进行验证。其中,222个电话被确认,表明BionanoOGM调用具有很高的精度(阳性预测值95%)。然后,我们询问了在其他两个数据集中,SV呼叫者识别出222个真正的BionanoSV的比例。在Illumina数据集中,灵敏度根据变体类型而变化,缺失高(115/134;86%),但插入差(13/58;22%)。在ONT数据集中,使用原始Sniffles变体调用器的灵敏度通常较差(总体为48%),但使用Sniffles2后有了很大提高(36/40;90%和17/23;74%的缺失和插入,分别)。总之,我们表明OGM的精度非常高。此外,应用Sniffles2调用者时,对于大多数SV类型,使用ONT长读序列数据进行SV调用的灵敏度优于Illumina测序.
    The identification of structural variants (SVs) in genomic data represents an ongoing challenge because of difficulties in reliable SV calling leading to reduced sensitivity and specificity. We prepared high-quality DNA from 9 parent-child trios, who had previously undergone short-read whole-genome sequencing (Illumina platform) as part of the Genomics England 100,000 Genomes Project. We reanalysed the genomes using both Bionano optical genome mapping (OGM; 8 probands and one trio) and Nanopore long-read sequencing (Oxford Nanopore Technologies [ONT] platform; all samples). To establish a \"truth\" dataset, we asked whether rare proband SV calls (n = 234) made by the Bionano Access (version 1.6.1)/Solve software (version 3.6.1_11162020) could be verified by individual visualisation using the Integrative Genomics Viewer with either or both of the Illumina and ONT raw sequence. Of these, 222 calls were verified, indicating that Bionano OGM calls have high precision (positive predictive value 95%). We then asked what proportion of the 222 true Bionano SVs had been identified by SV callers in the other two datasets. In the Illumina dataset, sensitivity varied according to variant type, being high for deletions (115/134; 86%) but poor for insertions (13/58; 22%). In the ONT dataset, sensitivity was generally poor using the original Sniffles variant caller (48% overall) but improved substantially with use of Sniffles2 (36/40; 90% and 17/23; 74% for deletions and insertions, respectively). In summary, we show that the precision of OGM is very high. In addition, when applying the Sniffles2 caller, the sensitivity of SV calling using ONT long-read sequence data outperforms Illumina sequencing for most SV types.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基因错误表达是在通常无活性的情况下基因的异常转录。尽管已知其在特定罕见疾病中的病理后果,我们对其在人类中的广泛流行和机制了解有限。为了解决这个问题,我们分析了来自INTERVAL研究献血者的4,568份全血批量RNA测序样本中的基因错误表达.我们发现,虽然个别错误表达事件很少发生,总的来说,它们存在于几乎所有的样本和三分之一的无活性蛋白质编码基因中。使用2,821个配对的全基因组和RNA测序样本,我们发现,对于罕见的结构变异体,错误表达事件富含顺式.我们建立了推测的机制,通过这些机制,SVs的一个子集导致基因错误表达,包括转录通读,转录融合,和基因倒位。总的来说,我们开发了错误表达作为转录组异常分析的一种类型,并扩展了我们对遗传变异影响基因表达的各种机制的理解。
    Gene misexpression is the aberrant transcription of a gene in a context where it is usually inactive. Despite its known pathological consequences in specific rare diseases, we have a limited understanding of its wider prevalence and mechanisms in humans. To address this, we analyzed gene misexpression in 4,568 whole-blood bulk RNA sequencing samples from INTERVAL study blood donors. We found that while individual misexpression events occur rarely, in aggregate they were found in almost all samples and a third of inactive protein-coding genes. Using 2,821 paired whole-genome and RNA sequencing samples, we identified that misexpression events are enriched in cis for rare structural variants. We established putative mechanisms through which a subset of SVs lead to gene misexpression, including transcriptional readthrough, transcript fusions, and gene inversion. Overall, we develop misexpression as a type of transcriptomic outlier analysis and extend our understanding of the variety of mechanisms by which genetic variants can influence gene expression.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    结构变体(SV)占灵长类动物物种内部和之间的大多数碱基对差异。然而,我们对物种间和物种内SV的理解一直受到灵长类动物基因组草图质量和关键分类群缺乏基因组资源的阻碍.最近,长读测序和基因组组装的进展已经开始从根本上重塑我们对SV的理解。两个具有里程碑意义的成就包括人类端粒到端粒(T2T)基因组的出版以及第一个人类pangenome参考文献的开发。在这次审查中,我们首先回顾为这些项目奠定基础的主要工作。然后,我们研究了T2T基因组组装和pangenome改变我们对灵长类SV的理解和方法的方式。最后,我们讨论了在T2T基因组和pangenomics时代,灵长类动物SV研究的未来可能是什么样子。
    Structural variants (SVs) account for the majority of base pair differences both within and between primate species. However, our understanding of inter- and intra-species SV has been historically hampered by the quality of draft primate genomes and the absence of genome resources for key taxa. Recently, advances in long-read sequencing and genome assembly have begun to radically reshape our understanding of SVs. Two landmark achievements include the publication of a human telomere-to-telomere (T2T) genome as well as the development of the first human pangenome reference. In this review, we first look back to the major works laying the foundation for these projects. We then examine the ways in which T2T genome assemblies and pangenomes are transforming our understanding of and approach to primate SV. Finally, we discuss what the future of primate SV research may look like in the era of T2T genomes and pangenomics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:使用第三代测序数据的结构变异(SV)检测方法被广泛采用,然而准确检测SV仍然具有挑战性。对于某些SV类型,不同的方法通常会产生不一致的结果,复杂的工具选择和揭示偏见的检测。
    结果:本研究使用来自PacBio的模拟和真实数据(CLR:连续长读,CCS:循环共有测序)和纳米孔(ONT)平台。我们评估它们在检测各种大小和类型的SV时的性能,断点偏差,和不同测序深度的基因分型准确性。值得注意的是,管道,如Minimap2-cuteSV2、NGMLR-SVIM、PBMM2-pbsv,Winnowmap-Sniffles2和Winnowmap-SVision表现出相对较高的召回率和精确度。我们的发现还表明,将多个管道与相同的对准器组合在一起,像pbmm2或winnowmap,可以显著提高性能。可以在动态表中查看各个管道的详细排名和性能指标:http://pmglab。top/SVPipelinesRanking。
    结论:这项研究全面描述了众多管道的优缺点,提供有价值的见解,可以改善第三代测序数据中的SV检测,并为SV注释和功能预测提供信息。
    BACKGROUND: Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection.
    RESULTS: This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines\' detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking .
    CONCLUSIONS: This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    结构变异(SV)是基因组变异的一种重要形式,它通过改变基因组的结构来影响基因功能和表达。尽管长读数据已被证明可以更好地表征SV,从嘈杂的长读数据中检测到的SV仍然包括相当一部分的误报呼叫。为了准确检测长读取数据中的SV,我们介绍SVDF,一种采用基于学习的噪声过滤策略和SV签名自适应聚类算法的方法,有效降低假阳性事件的可能性。多个正交实验的基准测试结果表明,跨越不同的测序平台和深度,与几种现有的通用SV调用工具相比,SVDF为每个样本实现了更高的调用精度。我们相信,凭借其细致而灵敏的SV检测能力,SVDF可以为前沿基因组研究带来新的机遇和进步。
    Structural variation (SV) is an important form of genomic variation that influences gene function and expression by altering the structure of the genome. Although long-read data have been proven to better characterize SVs, SVs detected from noisy long-read data still include a considerable portion of false-positive calls. To accurately detect SVs in long-read data, we present SVDF, a method that employs a learning-based noise filtering strategy and an SV signature-adaptive clustering algorithm, for effectively reducing the likelihood of false-positive events. Benchmarking results from multiple orthogonal experiments demonstrate that, across different sequencing platforms and depths, SVDF achieves higher calling accuracy for each sample compared to several existing general SV calling tools. We believe that, with its meticulous and sensitive SV detection capability, SVDF can bring new opportunities and advancements to cutting-edge genomic research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号