Sequence analysis

序列分析
  • 文章类型: Journal Article
    背景:尽管2015年美国医学遗传学和基因组学学院(ACMG)和分子病理学协会(AMP)指南,FBN1基因的许多变体仍然没有定论。根据ClinGen于2022年发布的FBN1特异性变异解释指南,我们重新评估了在我们机构中发现的FBN1基因中具有不确定意义的变异(VUS)。
    方法:在2015年12月至2022年4月的FBN1测序过程中发现的VUS根据FBN1特异性变异解释指南进行了重新评估。审查最新文献和其他基因测试,包括家族研究和/或RNA研究(如果可用)。
    结果:在695名接受FBN1测序的患者中,69例患者中发现61例VUS。其中,43例(62.3%)的38例VUS被重新分类为致病性和可能的致病性变异((L)PV),包括20个新颖的(L)PV。重新分类的主要原因是:(1)ACMG/AMP标准的基因特异性修饰,(2)更新文献和(3)补充基因检测。重新分类的最重要证据是关键氨基酸残基的澄清。
    结论:根据FBN1特定指南和最新数据库重新评估FBN1变异体后,大量的VUS被重新分类.鼓励临床实验室定期或在变体解释原理发生重大变化时进行变体重新评估。
    BACKGROUND: Despite the 2015 American College of Medical Genetics and Genomics (ACMG) and Association of Molecular Pathology (AMP) guideline, many variants of FBN1 gene remain inconclusive. In line with publication of the FBN1-specific variant interpretation guideline by ClinGen in 2022, we reassessed variants of uncertain significance (VUS) in FBN1 gene found in our institution.
    METHODS: VUS found in the course of FBN1 sequencing between December 2015 and April 2022 were reassessed based on FBN1-specific variant interpretation guideline, review of updated literatures and additional genetic tests including family study and/or RNA study if available.
    RESULTS: Out of 695 patients who underwent FBN1 sequencing, 61 VUS were found in 69 patients. Among them, 38 VUS in 43 patients (62.3%) were reclassified as pathogenic and likely pathogenic variant ((L)PV), including 20 novel (L)PV. Major causes of reclassification were: (1) gene-specific modification of ACMG/AMP criteria, (2) updated literatures and (3) additional genetic tests. The most important evidence for reclassification was clarification of critical amino acid residues.
    CONCLUSIONS: After reassessing FBN1 variants according to FBN1-specific guideline and up-to-date database, a significant number of VUS was reclassified. Clinical laboratories are encouraged to perform variant reassessment at regular intervals or when there is a major change in the principle of variant interpretation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:自2019年12月以来,SARS-CoV-2相关研究在全球范围内的重要性日益增加。SARS-CoV-2的几种新变种在全球范围内出现,其中最值得注意和最令人担忧的是英国变体B.1.1.7,南非变体B1.351和巴西变体P.1.在SARS-CoV-2监测中,检测和监测新的变异是必不可少的。虽然有几种工具可以组装病毒基因组并进行谱系分析以研究SARS-CoV-2,但每种工具都仅限于单独执行单个或几个功能。
    结果:由于缺乏公开可用的管道,它可以在原始SARS-CoV-2序列上执行快速的基于参考的组装,除了识别谱系以检测关注的变异外,我们已经开发了一个开源的生物信息学管道,称为HAVoC(赫尔辛基大学关注变体分析仪)。HAVoC可以参考组装原始序列读段并将相应的谱系分配给SARS-CoV-2序列。
    结论:HAVoC是利用几种生物信息学工具进行多种必要分析以调查SARS-CoV-2样本之间的遗传变异的管道。该管道对于那些需要更容易获得和快速的工具来检测和监测SARS-CoV-2变体在当地爆发期间的传播特别有用。HAVoC目前在芬兰用于监测SARS-CoV-2变体的传播。HAVoC用户手册和源代码可在https://www上获得。赫尔辛基.fi/en/projects/havoc和https://bitbucket.org/auto_cov_pipeline/havoc,分别。
    BACKGROUND: SARS-CoV-2 related research has increased in importance worldwide since December 2019. Several new variants of SARS-CoV-2 have emerged globally, of which the most notable and concerning currently are the UK variant B.1.1.7, the South African variant B1.351 and the Brazilian variant P.1. Detecting and monitoring novel variants is essential in SARS-CoV-2 surveillance. While there are several tools for assembling virus genomes and performing lineage analyses to investigate SARS-CoV-2, each is limited to performing singular or a few functions separately.
    RESULTS: Due to the lack of publicly available pipelines, which could perform fast reference-based assemblies on raw SARS-CoV-2 sequences in addition to identifying lineages to detect variants of concern, we have developed an open source bioinformatic pipeline called HAVoC (Helsinki university Analyzer for Variants of Concern). HAVoC can reference assemble raw sequence reads and assign the corresponding lineages to SARS-CoV-2 sequences.
    CONCLUSIONS: HAVoC is a pipeline utilizing several bioinformatic tools to perform multiple necessary analyses for investigating genetic variance among SARS-CoV-2 samples. The pipeline is particularly useful for those who need a more accessible and fast tool to detect and monitor the spread of SARS-CoV-2 variants of concern during local outbreaks. HAVoC is currently being used in Finland for monitoring the spread of SARS-CoV-2 variants. HAVoC user manual and source code are available at https://www.helsinki.fi/en/projects/havoc and https://bitbucket.org/auto_cov_pipeline/havoc , respectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Single-molecule sequencing technologies produce much longer reads compared with next-generation sequencing, greatly improving the contiguity of de novo assembly of genomes. However, the relatively high error rates in long reads make it challenging to obtain high-quality assemblies. A computationally intensive consensus step is needed to resolve the discrepancies in the reads. Efficient consensus tools have emerged in the recent past, based on partial-order alignment. In this study, we discovered that the spatial relationship of alignment pileup is crucial to high-quality consensus and developed a deep learning-based consensus tool, CONNET, which outperforms the fastest tools in terms of both accuracy and speed. We tested CONNET using a 90× dataset of E. coli and a 37× human dataset. In addition to achieving high-quality consensus results, CONNET is capable of delivering phased diploid genome consensus. Diploid consensus on the above-mentioned human assembly further reduced 12% of the consensus errors made in the haploid results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    癌症系统发育是研究肿瘤发生的关键,具有临床意义。由于癌症的异质性和当前测序技术的局限性,当前的癌症系统发育推断方法确定了一个大的解空间的合理的系统发育。为了促进进一步的下游分析,准确总结这样一组癌症系统发育的方法是必要的。然而,当前的汇总方法仅限于单个共识树或图,并且可能会错过存在于候选树的不同子集中的重要拓扑特征。
    我们引入多共识树(MCT)问题来同时聚类T并为每个聚类推断一个共识树。我们证明MCT是NP难的,提出了一种基于混合整数线性规划(MILP)的精确算法。此外,我们引入了一种有效识别高质量共识树的启发式算法,在一小部分时间恢复模拟数据中由MILP确定的所有最佳解决方案。我们证明了我们的方法在模拟和真实数据上的适用性,显示我们的方法根据解决方案空间T的复杂性选择集群的数量。
    https://github.com/elkebir-group/MCT。
    补充数据可在生物信息学在线获得。
    Cancer phylogenies are key to studying tumorigenesis and have clinical implications. Due to the heterogeneous nature of cancer and limitations in current sequencing technology, current cancer phylogeny inference methods identify a large solution space of plausible phylogenies. To facilitate further downstream analyses, methods that accurately summarize such a set T of cancer phylogenies are imperative. However, current summary methods are limited to a single consensus tree or graph and may miss important topological features that are present in different subsets of candidate trees.
    We introduce the Multiple Consensus Tree (MCT) problem to simultaneously cluster T and infer a consensus tree for each cluster. We show that MCT is NP-hard, and present an exact algorithm based on mixed integer linear programming (MILP). In addition, we introduce a heuristic algorithm that efficiently identifies high-quality consensus trees, recovering all optimal solutions identified by the MILP in simulated data at a fraction of the time. We demonstrate the applicability of our methods on both simulated and real data, showing that our approach selects the number of clusters depending on the complexity of the solution space T.
    https://github.com/elkebir-group/MCT.
    Supplementary data are available at Bioinformatics online.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Evaluation Study
    背景:重复元件是大多数真核生物基因组的重要组成部分。大多数现有的重复分析工具依赖于高质量的参考基因组或现有的重复文库。因此,对具有高度重复或复杂基因组的物种进行重复分析仍然具有挑战性,这些基因组通常没有良好的参考基因组或带注释的重复文库。最近,我们开发了一种称为REPdenovo的计算方法,该方法直接从短序列读段构建共有重复序列,它的性能优于名为RepARK的现有工具。REPdenovo的一个主要问题是它对于相对高发散率或低拷贝数的重复表现不佳。在本文中,我们提出了一种改进的方法,可以直接从短读段构建共识重复。与原始的REPdenovo相比,改进的方法使用更多重复相关的k-mer,并使用基于共识的k-mer处理方法提高重复装配质量.
    结果:我们将新方法与REPdenovo和RepARK在Human上的性能进行了比较,拟南芥和果蝇短测序数据。与原始的REPdenovo和RepARK相比,新方法在Repbase中完全构建了更多的重复序列,特别是对于较高发散率和较低拷贝数的重复。我们还将我们的新方法应用于没有已知重复库的蜂鸟数据,它构造了许多可以使用PacBio长读取进行验证的重复元素。
    结论:我们提出了一种直接从短序列读段重建重复元件的改进方法。结果表明,我们的新方法可以比REPdenovo(以及RepARK)组装更完整的重复序列。我们的新方法已作为REPdenovo软件包的一部分实施,可以在https://github.com/Reedwarbler/REPdenovo上下载。
    BACKGROUND: Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or complex genomes which often do not have good reference genomes or annotated repeat libraries. Recently we developed a computational method called REPdenovo that constructs consensus repeat sequences directly from short sequence reads, which outperforms an existing tool called RepARK. One major issue with REPdenovo is that it doesn\'t perform well for repeats with relatively high divergence rates or low copy numbers. In this paper, we present an improved approach for constructing consensus repeats directly from short reads. Comparing with the original REPdenovo, the improved approach uses more repeat-related k-mers and improves repeat assembly quality using a consensus-based k-mer processing method.
    RESULTS: We compare the performance of the new method with REPdenovo and RepARK on Human, Arabidopsis thaliana and Drosophila melanogaster short sequencing data. And the new method fully constructs more repeats in Repbase than the original REPdenovo and RepARK, especially for repeats of higher divergence rates and lower copy number. We also apply our new method on Hummingbird data which doesn\'t have a known repeat library, and it constructs many repeat elements that can be validated using PacBio long reads.
    CONCLUSIONS: We propose an improved method for reconstructing repeat elements directly from short sequence reads. The results show that our new method can assemble more complete repeats than REPdenovo (and also RepARK). Our new approach has been implemented as part of the REPdenovo software package, which is available for download at https://github.com/Reedwarbler/REPdenovo .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    High-throughput sequencing of B-cell immunoglobulin repertoires is increasingly being applied to gain insights into the adaptive immune response in healthy individuals and in those with a wide range of diseases. Recent applications include the study of autoimmunity, infection, allergy, cancer and aging. As sequencing technologies continue to improve, these repertoire sequencing experiments are producing ever larger datasets, with tens- to hundreds-of-millions of sequences. These data require specialized bioinformatics pipelines to be analyzed effectively. Numerous methods and tools have been developed to handle different steps of the analysis, and integrated software suites have recently been made available. However, the field has yet to converge on a standard pipeline for data processing and analysis. Common file formats for data sharing are also lacking. Here we provide a set of practical guidelines for B-cell receptor repertoire sequencing analysis, starting from raw sequencing reads and proceeding through pre-processing, determination of population structure, and analysis of repertoire properties. These include methods for unique molecular identifiers and sequencing error correction, V(D)J assignment and detection of novel alleles, clonal assignment, lineage tree construction, somatic hypermutation modeling, selection analysis, and analysis of stereotyped or convergent responses. The guidelines presented here highlight the major steps involved in the analysis of B-cell repertoire sequencing data, along with recommendations on how to avoid common pitfalls.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    目前,科学文献和会议中提出了几种不同的Y染色体系统发育和单倍群命名法,证明了法医和人类学研究中使用的Y染色体系统发育树和Y-SNP集的当前多样性。这种情况可以归因于主要由于下一代测序(NGS)研究而发现的Y-SNP数量的指数增长。由于Y-SNP及其各自的系统发育位置在法医学中很重要,例如男性血统表征和父系生物地理祖先推断,法医遗传学家需要知道如何处理这些新发现的Y-SNP和系统发育,特别是因为这些系统发育通常是为了进行法医遗传研究以外的其他目的而创建的。因此,我们在此概述了当前NGS时代科学研究中目前使用的四类Y染色体系统发育和相关的Y-SNP集.我们根据构造方法比较这些类别,它们的优点和缺点,可以使用系统发育树的学科,以及它们与法医遗传学家的特定相关性。基于此概述,很明显,具有共识Y-SNP集和稳定命名法的最新简化树将是法医学研究的最合适的参考资源。因此,强烈建议采取主动行动,达成这样的国际共识。
    Currently, several different Y-chromosomal phylogenies and haplogroup nomenclatures are presented in scientific literature and at conferences demonstrating the present diversity in Y-chromosomal phylogenetic trees and Y-SNP sets used within forensic and anthropological research. This situation can be ascribed to the exponential growth of the number of Y-SNPs discovered due to mostly next-generation sequencing (NGS) studies. As Y-SNPs and their respective phylogenetic positions are important in forensics, such as for male lineage characterization and paternal bio-geographic ancestry inference, there is a need for forensic geneticists to know how to deal with these newly identified Y-SNPs and phylogenies, especially since these phylogenies are often created with other aims than to carry out forensic genetic research. Therefore, we give here an overview of four categories of currently used Y-chromosomal phylogenies and the associated Y-SNP sets in scientific research in the current NGS era. We compare these categories based on the construction method, their advantages and disadvantages, the disciplines wherein the phylogenetic tree can be used, and their specific relevance for forensic geneticists. Based on this overview, it is clear that an up-to-date reduced tree with a consensus Y-SNP set and a stable nomenclature will be the most appropriate reference resource for forensic research. Initiatives to reach such an international consensus are therefore highly recommended.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    The publication of the ACMG recommendations has reignited the debate over predictive testing for adult-onset disorders in minors. Response has been polarized. With this in mind, we review and critically analyze this debate. First, we identify long-standing inconsistencies between consensus guidelines and clinical practice regarding risk assessment for adult-onset genetic disorders in children using family history and molecular analysis. Second, we discuss the disparate assumptions regarding the nature of whole genome and exome sequencing underlying arguments of both supporters and critics, and the role these assumptions play in the arguments for and against reporting. Third, we suggest that implicit differences regarding the definition of best interests of the child underlie disparate conclusions as to the best interests of children in this context. We conclude by calling for clarity and consensus concerning the central foci of this debate.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号