genome analysis

基因组分析
  • 文章类型: Journal Article
    测序技术的发展增加了被测序的基因组的数量。然而,通过在存在重复序列(重复)的情况下组装大量的短字符串(读段),获得高质量的基因组序列仍然是基因组组装中的挑战。用于基因组组装的计算机算法以两种方法从读段构建整个基因组。从头方法基于它们的后缀前缀(重叠)之间的精确匹配来连接读段。参考指导的方法基于它们在众所周知的参考基因组中的偏移对读段进行排序(读段比对)。重复的存在扩展了技术上的歧义,使得算法无法区分读段,从而导致误组装,影响组装方法的准确性。另一方面,大量的读取导致了一个大的组装性能挑战。
    通过预先鉴定重复序列,将重复鉴定方法引入错误组装,创建重复知识库以减少装配过程中的歧义,从而提高了组装基因组的准确性。此外,在参考基因组的帮助下,组装方法之间的杂交导致较低的误组装程度。通过数据结构索引和并行化来优化装配性能。本文的主要目的和贡献是通过广泛的综述来支持研究人员,以简化其他研究人员对基因组组装研究的搜索。这项研究还,重点介绍了基因组组装准确性和性能优化方面的最新进展和局限性。
    我们的发现表明了可用的重复识别方法的局限性,只允许检测重复的特定长度,当基因组中存在各种类型的重复时,可能表现不佳。我们还发现,大多数混合组装方法,无论是从头开始还是参考指导,在处理重复序列方面有一些限制,因为它在计算上更昂贵且时间密集。尽管发现混合方法优于单独的组装方法,优化其性能仍然是一个挑战。此外,在基因组组装的重叠和读段比对中并行化的使用尚未在混合组装方法中完全实现。
    我们建议将多种重复识别方法结合起来,以提高识别重复的准确性,作为混合组装方法的初始步骤,并将基因组索引与并行化结合起来,以更好地优化其性能。
    UNASSIGNED: The development of sequencing technology increases the number of genomes being sequenced. However, obtaining a quality genome sequence remains a challenge in genome assembly by assembling a massive number of short strings (reads) with the presence of repetitive sequences (repeats). Computer algorithms for genome assembly construct the entire genome from reads in two approaches. The de novo approach concatenates the reads based on the exact match between their suffix-prefix (overlapping). Reference-guided approach orders the reads based on their offsets in a well-known reference genome (reads alignment). The presence of repeats extends the technical ambiguity, making the algorithm unable to distinguish the reads resulting in misassembly and affecting the assembly approach accuracy. On the other hand, the massive number of reads causes a big assembly performance challenge.
    UNASSIGNED: The repeat identification method was introduced for misassembly by prior identification of repetitive sequences, creating a repeat knowledge base to reduce ambiguity during the assembly process, thus enhancing the accuracy of the assembled genome. Also, hybridization between assembly approaches resulted in a lower misassembly degree with the aid of the reference genome. The assembly performance is optimized through data structure indexing and parallelization. This article\'s primary aim and contribution are to support the researchers through an extensive review to ease other researchers\' search for genome assembly studies. The study also, highlighted the most recent developments and limitations in genome assembly accuracy and performance optimization.
    UNASSIGNED: Our findings show the limitations of the repeat identification methods available, which only allow to detect of specific lengths of the repeat, and may not perform well when various types of repeats are present in a genome. We also found that most of the hybrid assembly approaches, either starting with de novo or reference-guided, have some limitations in handling repetitive sequences as it is more computationally costly and time intensive. Although the hybrid approach was found to outperform individual assembly approaches, optimizing its performance remains a challenge. Also, the usage of parallelization in overlapping and reads alignment for genome assembly is yet to be fully implemented in the hybrid assembly approach.
    UNASSIGNED: We suggest combining multiple repeat identification methods to enhance the accuracy of identifying the repeats as an initial step to the hybrid assembly approach and combining genome indexing with parallelization for better optimization of its performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    新出现的小麦稻瘟病菌小麦稻瘟病菌(MoT)对全球小麦生产构成严重威胁。真菌是一种独特的,米曲霉的谱系异常多样,引起稻瘟病。采用MoT特异性标记的基于基因组的方法用于检测MoT场分离株。对整个基因组进行测序表明存在核心染色体和微型染色体序列,这些序列包含效应子基因并经历不同的进化途径。真菌种群中显着的遗传和病理类型多样性为进化变化提供了足够的潜力。鉴定和精炼遗传标记允许跟踪具有稳定的抗爆菌抗性的基因组区域。数量和R基因抗性在流行品种中的引入对于在病原体种群多样化且成熟的地区控制疾病至关重要。CRISPR/Cas-9基因组编辑等新方法可以在短时间内在小麦中产生抗性品种。本章对小麦稻瘟病菌MoT的遗传和基因组方面进行了广泛的总结,并为受影响地区的小麦稻瘟病研究提供了必要的资源。
    The newly emerged wheat blast fungus Magnaporthe oryzae Triticum (MoT) is a severe threat to global wheat production. The fungus is a distinct, exceptionally diverse lineage of the M. oryzae, causing rice blast disease. Genome-based approaches employing MoT-specific markers are used to detect MoT field isolates. Sequencing the whole genome indicates the presence of core chromosome and mini-chromosome sequences that harbor effector genes and undergo divergent evolutionary routes. Significant genetic and pathotype diversity within the fungus population gives ample potential for evolutionary change. Identifying and refining genetic markers allows for tracking genomic regions with stable blast resistance. Introgression of quantitative and R gene resistance into popular cultivars is crucial to controlling disease in areas where the pathogen population is diverse and well established. Novel approaches such as CRISPR/Cas-9 genome editing could generate resistant varieties in wheat within a short time. This chapter provides an extensive summary of the genetic and genomic aspects of the wheat blast fungus MoT and offers an essential resource for wheat blast research in the affected areas.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    SARS-CoV-2是一种新的人类冠状病毒(CoV),它于2019年底在中国出现,是造成全球COVID-19大流行的原因,该流行在12个月内造成超过9700万感染和200万人死亡。了解这种病毒的起源是一个重要的问题,有必要确定病毒传播的机制,以遏制未来的流行病。基于系统发育推断,冠状病毒蛋白的序列分析和结构-功能关系,根据目前可用的病毒知识,我们讨论了病毒来源的不同情况-天然或合成-。目前可用的数据不足以坚定地断言SARS-CoV2是由人畜共患的出现还是由实验室菌株的意外逃逸引起的。这个问题需要解决,因为它对我们与生态系统相互作用的风险/利益平衡有重要影响,关于野生动物和家畜的密集繁殖,关于一些实验室实践以及科学政策和生物安全法规。不管COVID-19的起源,研究与大流行病毒的出现有关的分子机制的进化对于开发治疗和疫苗策略以及预防未来的人畜共患病至关重要。本文是在Médecine/Sciences上发表的法语文章的翻译和更新,2020年8月/9月(10.1051/medsci/2020123)。
    本文的在线版本(10.1007/s10311-020-01151-1)包含补充材料,它可用于授权用户。
    SARS-CoV-2 is a new human coronavirus (CoV), which emerged in China in late 2019 and is responsible for the global COVID-19 pandemic that caused more than 97 million infections and 2 million deaths in 12 months. Understanding the origin of this virus is an important issue, and it is necessary to determine the mechanisms of viral dissemination in order to contain future epidemics. Based on phylogenetic inferences, sequence analysis and structure-function relationships of coronavirus proteins, informed by the knowledge currently available on the virus, we discuss the different scenarios on the origin-natural or synthetic-of the virus. The data currently available are not sufficient to firmly assert whether SARS-CoV2 results from a zoonotic emergence or from an accidental escape of a laboratory strain. This question needs to be solved because it has important consequences on the risk/benefit balance of our interactions with ecosystems, on intensive breeding of wild and domestic animals, on some laboratory practices and on scientific policy and biosafety regulations. Regardless of COVID-19 origin, studying the evolution of the molecular mechanisms involved in the emergence of pandemic viruses is essential to develop therapeutic and vaccine strategies and to prevent future zoonoses. This article is a translation and update of a French article published in Médecine/Sciences, August/September 2020 (10.1051/medsci/2020123).
    UNASSIGNED: The online version of this article (10.1007/s10311-020-01151-1) contains supplementary material, which is available to authorized users.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号