phylogeny

系统发育
  • 文章类型: Journal Article
    最近,甲型流感病毒的高通量测序已成为常规检测。应当注意,甲型流感病毒的极高多样性使确定所有八个基因组区段的序列的任务复杂化。为了快速准确的分析,有必要为每个部分选择最合适的参考。同时,在解码测序结果的领域中没有标准化的方法允许用户更新通过病毒测序获得的读段与之比较的序列数据库。IAVCP(甲型流感病毒共识和系统发育)的开发目的是自动分析甲型流感病毒的高通量测序数据。其目标包括直接从配对的原始读段中提取共有基因组。此外,通过分析自动重建的系统发育树的拓扑结构,该管道能够识别感兴趣病毒进化史中的潜在重配事件。
    Recently, high-throughput sequencing of influenza A viruses has become a routine test. It should be noted that the extremely high diversity of the influenza A virus complicates the task of determining the sequences of all eight genome segments. For a fast and accurate analysis, it is necessary to select the most suitable reference for each segment. At the same time, there is no standardized method in the field of decoding sequencing results that allows the user to update the sequence databases to which the reads obtained by virus sequencing are compared. The IAVCP (influenza A virus consensus and phylogeny) was developed with the goal of automatically analyzing high-throughput sequencing data of influenza A viruses. Its goals include the extraction of a consensus genome directly from paired raw reads. In addition, the pipeline enables the identification of potential reassortment events in the evolutionary history of the virus of interest by analyzing the topological structure of phylogenetic trees that are automatically reconstructed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究提出了一种通过测序数据比较研究严重急性呼吸道综合征冠状病毒2病毒突变的新方法。传统的基于共识的方法,集中在每个位置最常见的核苷酸,可能会忽略或掩盖低频变体的存在。我们的方法,相比之下,在每个位置保留所有测序的核苷酸,形成基因组矩阵。利用来自具有指定突变的基因组的模拟短读数,我们将我们的基因组矩阵方法与共有序列方法进行了对比.我们的矩阵方法,跨多个模拟数据集,准确地反映了已知的突变,与共识方法相比,平均准确度提高了20%。在使用GISAID和NCBI-SRA数据的实际测试中,我们的方法通过将误差幅度减少约15%,证明了可靠性的提高。基因组矩阵方法提供了病毒基因组多样性的更准确的表示,从而提供对病毒进化和流行病学的优越见解。
    This study proposes a novel approach to studying severe acute respiratory syndrome coronavirus 2 virus mutations through sequencing data comparison. Traditional consensus-based methods, which focus on the most common nucleotide at each position, might overlook or obscure the presence of low-frequency variants. Our method, in contrast, retains all sequenced nucleotides at each position, forming a genomic matrix. Utilizing simulated short reads from genomes with specified mutations, we contrasted our genomic matrix approach with the consensus sequence method. Our matrix methodology, across multiple simulated datasets, accurately reflected the known mutations with an average accuracy improvement of 20% over the consensus method. In real-world tests using data from GISAID and NCBI-SRA, our approach demonstrated an increase in reliability by reducing the error margin by approximately 15%. The genomic matrix approach offers a more accurate representation of the viral genomic diversity, thereby providing superior insights into virus evolution and epidemiology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    蛋白质序列编码其能量景观-所有可获得的构象,能量学,和动态。序列和景观之间的进化关系可以通过编译同源序列的多序列比对并通过祖先序列重建或在每个位置包含最常见氨基酸的共有蛋白产生共同祖先来进行系统发育。祖先和共有蛋白通常比现有的同源物更稳定-质疑它们之间的差异,并暗示这两种方法都可以作为设计热稳定性的一般方法。我们使用核糖核酸酶H家族来比较这些方法,并评估输入序列的进化关系如何影响所得共有蛋白的性质。虽然来自我们的完整核糖核酸酶H序列比对的共有蛋白是结构化和活跃的,它既不显示折叠良好的蛋白质的特性,也没有增强的稳定性。相比之下,来自系统发育限制性序列集的共有蛋白明显更稳定和合作折叠,这表明协同性可能在不同的进化枝中由不同的机制编码,而当太多不同的进化枝结合在一起产生一个共有蛋白时就会丢失。为了探索这个,我们使用Potts形式主义比较了成对协方差分数,以及使用奇异值分解(SVD)比较了高阶序列相关性。我们发现稳定的共有序列的SVD坐标接近类似祖先序列及其后代的坐标,而不稳定的共有序列是SVD空间中的异常值。
    A protein sequence encodes its energy landscape-all the accessible conformations, energetics, and dynamics. The evolutionary relationship between sequence and landscape can be probed phylogenetically by compiling a multiple sequence alignment of homologous sequences and generating common ancestors via Ancestral Sequence Reconstruction or a consensus protein containing the most common amino acid at each position. Both ancestral and consensus proteins are often more stable than their extant homologs-questioning the differences between them and suggesting that both approaches serve as general methods to engineer thermostability. We used the Ribonuclease H family to compare these approaches and evaluate how the evolutionary relationship of the input sequences affects the properties of the resulting consensus protein. While the consensus protein derived from our full Ribonuclease H sequence alignment is structured and active, it neither shows properties of a well-folded protein nor has enhanced stability. In contrast, the consensus protein derived from a phylogenetically-restricted set of sequences is significantly more stable and cooperatively folded, suggesting that cooperativity may be encoded by different mechanisms in separate clades and lost when too many diverse clades are combined to generate a consensus protein. To explore this, we compared pairwise covariance scores using a Potts formalism as well as higher-order sequence correlations using singular value decomposition (SVD). We find the SVD coordinates of a stable consensus sequence are close to coordinates of the analogous ancestor sequence and its descendants, whereas the unstable consensus sequences are outliers in SVD space.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    近300万人生活在慢性乙型肝炎病毒(HBV)感染(CHB),没有治愈性疗法。由于病毒多样性与感染的发病机制和免疫控制有关,改进表征这种多样性的方法可以帮助药物开发工作。传统上,病毒测序数据映射/比对参考基因组,并且仅保留比对的序列用于分析。因此,参考选择至关重要,然而,先验选择最具代表性的参考仍然很困难。我们研究了一种替代的pangenome方法,该方法可以将多个参考序列组合成可以在比对期间使用的图。使用从公开可用的HBV基因组和来自CHB个体的真实测序数据生成的模拟短读测序数据,我们证明了与系统发育代表“基因组图”的对齐可以改善对齐,避免引用歧义的问题,并促进构建与个体感染遗传上更相似的样本特异性共有序列。基于图的方法可以,因此,改进表征病毒病原体遗传学的努力,包括HBV,并在宿主病原体研究中具有更广泛的意义。
    Nearly 300 million individuals live with chronic hepatitis B virus (HBV) infection (CHB), for which no curative therapy is available. As viral diversity is associated with pathogenesis and immunological control of infection, improved methods to characterize this diversity could aid drug development efforts. Conventionally, viral sequencing data are mapped/aligned to a reference genome, and only the aligned sequences are retained for analysis. Thus, reference selection is critical, yet selecting the most representative reference a priori remains difficult. We investigate an alternative pangenome approach which can combine multiple reference sequences into a graph which can be used during alignment. Using simulated short-read sequencing data generated from publicly available HBV genomes and real sequencing data from an individual living with CHB, we demonstrate alignment to a phylogenetically representative \'genome graph\' can improve alignment, avoid issues of reference ambiguity, and facilitate the construction of sample-specific consensus sequences more genetically similar to the individual\'s infection. Graph-based methods can, therefore, improve efforts to characterize the genetics of viral pathogens, including HBV, and have broader implications in host-pathogen research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    根据一项关于修订《国际原核生物命名法》附录9的建议,其中包括按地理位置命名属的准则,我在此报告国际原核生物系统学委员会成员对这一提案的投票结果,并提出将纳入附录9的准则。
    Following a proposal to emend Appendix 9 of the International Code of Nomenclature of Prokaryotes with guidelines for the naming of genera after geographical locations, I here report the outcome of the ballot on this proposal by the members of the International Committee on Systematics of Prokaryotes and present the guidelines to be incorporated in Appendix 9.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    由于测序数据中肿瘤系统发育推断的不确定性,许多方法推断多个,同一种癌症的系统发育同样合理。总结肿瘤系统发育的解空间T,共识树方法在指定的成对树距离函数下寻求单个最佳代表树S。一个这样的距离函数是祖先-后代(AD)距离[公式:见文本],等于边集的传递闭包的对称差的大小[公式:见文本]和[公式:见文本]。这里,我们表明,找到了肿瘤系统发育T的共识树S,该共识树S使总AD距离∑TεTd(S,T)是NP难的。
    Due to uncertainty in tumor phylogeny inference from sequencing data, many methods infer multiple, equally plausible phylogenies for the same cancer. To summarize the solution space T of tumor phylogenies, consensus tree methods seek a single best representative tree S under a specified pairwise tree distance function. One such distance function is the ancestor-descendant (AD) distance [Formula: see text] , which equals the size of the symmetric difference of the transitive closures of the edge sets [Formula: see text] and [Formula: see text] . Here, we show that finding a consensus tree S for tumor phylogenies T that minimizes the total AD distance [Formula: see text] is NP-hard.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    COVID-19大流行强调了研究冠状病毒(CoV)的重要性。本研究使用四种结构蛋白(S,E,M,和N),并介绍了一种共识方法来构建一个全面的系统基因组网络。我们将CoV聚类为4个属与当前的CoV分类一致。此外,我们计算网络中心性度量来识别具有显著平均加权度和介数中心性值的CoV毒株,特别关注β属中的RaTG13和γ属中的NGA/A116E7/2006。我们使用基于距离的方法和基于特征的模型与IQ-TREE比较了CoV的系统发育。这两种方法都产生了基本一致的结果,表明我们共识方法的可靠性。然而,值得一提的是,我们的共识方法在分析350个CoV的数据集时,与IQ-TREE相比,速度提高了约5000倍.这种效率的提高增强了对CoV进行大规模系统基因组研究的可行性。
    The COVID-19 pandemic emphasizes the significance of studying coronaviruses (CoVs). This study investigates the evolutionary patterns of 350 CoVs using four structural proteins (S, E, M, and N) and introduces a consensus methodology to construct a comprehensive phylogenomic network. Our clustering of CoVs into 4 genera is consistent with the current CoV classification. Additionally, we calculate network centrality measures to identify CoV strains with significant average weighted degree and betweenness centrality values, with a specific focus on RaTG13 in the beta genus and NGA/A116E7/2006 in the gamma genus. We compare the phylogenetics of CoVs using our distance-based approach and the character-based model with IQ-TREE. Both methods yield largely consistent outcomes, indicating the reliability of our consensus approach. However, it is worth mentioning that our consensus method achieves an approximate 5000-fold increase in speed compared to IQ-TREE when analyzing the data set of 350 CoVs. This improved efficiency enhances the feasibility of conducting large-scale phylogenomic studies on CoVs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    医学上重要的真菌名称更改的快速步伐给参与患者护理的临床实验室和临床医生带来了挑战。我们描述了名称更改的两个来源,它们具有不同的驱动因素,在物种和属的水平。这里提出了一些建议,以减少名称更改的数量。我们敦促分类学家提供分类新颖性的诊断标记。鉴于系统发育树由于可变分类单元采样而不稳定,我们主张保持属在最大可能的大小。复合物或系列中已识别物种的报告应尽可能包括总体物种的名称和分子同胞的名称,通常是神秘的物种。因为对同一物种使用不同的名称在未来许多年是不可避免的,所有医学上重要的真菌名称的开放式在线数据库,具有适当的名称名称和同义词,是必不可少的。我们进一步建议,虽然分类学发现仍在继续,临床实验室和临床医生对新名称更改的适应,由常设委员会进行常规审查,以便随着时间的推移进行验证和稳定性,参考开放访问数据库,其中更改的原因以透明的方式列出。
    The rapid pace of name changes of medically important fungi is creating challenges for clinical laboratories and clinicians involved in patient care. We describe two sources of name change which have different drivers, at the species versus the genus level. Some suggestions are made here to reduce the number of name changes. We urge taxonomists to provide diagnostic markers of taxonomic novelties. Given the instability of phylogenetic trees due to variable taxon sampling, we advocate to maintain genera at the largest possible size. Reporting of identified species in complexes or series should where possible comprise both the name of the overarching species and that of the molecular sibling, often cryptic species. Because the use of different names for the same species will be unavoidable for many years to come, an open access online database of the names of all medically important fungi, with proper nomenclatural designation and synonymy, is essential. We further recommend that while taxonomic discovery continues, the adaptation of new name changes by clinical laboratories and clinicians be reviewed routinely by a standing committee for validation and stability over time, with reference to an open access database, wherein reasons for changes are listed in a transparent way.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在许多生物体中,尤其是那些关心保护的人,分类学划定的传统证据,比如形态学数据,往往很难获得。在这些情况下,遗传数据通常是可用于分类学研究的唯一信息来源。特别是,与常规使用线粒体基因组的控制区或其他基因片段相比,线粒体基因组的人群调查提供了更高的分辨率和精确度,以支持分类学决策。为了改善鲸目动物分类学决策的定量指南,我们基于先前针对控制区域的努力,进行评估,对于整个有丝分裂基因组序列,一套对公认的鲸类动物种群的差异和可诊断性估计,亚种和物种。从这篇综述来看,我们推荐基于完整的有丝分裂基因组的新指南,结合其他类型的孤立和分歧的证据,这将提高分类决策的分辨率,特别是面对小样本量或低水平的遗传多样性。我们进一步使用模拟数据来帮助解释不同形式的历史人口学背景下的分歧,文化,和生态。
    In many organisms, especially those of conservation concern, traditional lines of evidence for taxonomic delineation, such as morphological data, are often difficult to obtain. In these cases, genetic data are often the only source of information available for taxonomic studies. In particular, population surveys of mitochondrial genomes offer increased resolution and precision in support of taxonomic decisions relative to conventional use of the control region or other gene fragments of the mitochondrial genome. To improve quantitative guidelines for taxonomic decisions in cetaceans, we build on a previous effort targeting the control region and evaluate, for whole mitogenome sequences, a suite of divergence and diagnosability estimates for pairs of recognized cetacean populations, subspecies, and species. From this overview, we recommend new guidelines based on complete mitogenomes, combined with other types of evidence for isolation and divergence, which will improve resolution for taxonomic decisions, especially in the face of small sample sizes or low levels of genetic diversity. We further use simulated data to assist interpretations of divergence in the context of varying forms of historical demography, culture, and ecology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Letter
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号