Reference Sequence

参考顺序
  • 文章类型: Journal Article
    背景:2022年,全球爆发猴痘,其流行病学特征发生了显着变化。猴痘病毒(MPXV)属于B.1谱系,在这项研究中,研究了与疫情相关的基因组变异。先前的研究表明,病毒基因组变异在病毒的致病性和传播性中起着至关重要的作用。因此,了解MPXV的基因组变异对于控制未来的疫情至关重要.
    方法:本研究采用生物信息学和系统发育方法来评估MPXVB.1谱系中的关键基因组变异。共筛选出979株MPXV,分析212个代表性菌株以鉴定病毒基因组中的特异性取代。基于每个位点处最常见的核苷酸,为10个谱系中的每一个构建参考序列。总共鉴定了49个替换,有23个非同义替换。I类变体,对可能影响病毒特性的蛋白质构象有显著影响,被归类为非同义替换。
    结果:系统发育分析揭示了10个相对单系的分支。该研究确定了B.1谱系特有的49个替换,有23个非同义替换被归类为I类,II,和III变体。I类变体可能是2022年观察到的循环MPXV特征变化的原因。这些关键突变,特别是I类变体,在MPXV的致病性和传播性中起着至关重要的作用。
    结论:这项研究提供了对与最近爆发猴痘相关的B.1谱系中MPXV基因组变异的理解。关键突变的鉴定,特别是I类变体,揭示了观察到的循环MPXV特征变化的分子机制。进一步的研究可以集中在受这些突变影响的功能域上,能够制定有效的控制策略来应对未来的猴痘疫情。
    In 2022, a global outbreak of monkeypox occurred with a significant shift in its epidemiological characteristics. The monkeypox virus (MPXV) belongs to the B.1 lineage, and its genomic variations that were linked to the outbreak were investigated in this study. Previous studies have suggested that viral genomic variation plays a crucial role in the pathogenicity and transmissibility of viruses. Therefore, understanding the genomic variation of MPXV is crucial for controlling future outbreaks.
    This study employed bioinformatics and phylogenetic approaches to evaluate the key genomic variation in the B.1 lineage of MPXV. A total of 979 MPXV strains were screened, and 212 representative strains were analyzed to identify specific substitutions in the viral genome. Reference sequences were constructed for each of the 10 lineages based on the most common nucleotide at each site. A total of 49 substitutions were identified, with 23 non-synonymous substitutions. Class I variants, which had significant effects on protein conformation likely to affect viral characteristics, were classified among the non-synonymous substitutions.
    The phylogenetic analysis revealed 10 relatively monophyletic branches. The study identified 49 substitutions specific to the B.1 lineage, with 23 non-synonymous substitutions that were classified into Class I, II, and III variants. The Class I variants were likely responsible for the observed changes in the characteristics of circulating MPXV in 2022. These key mutations, particularly Class I variants, played a crucial role in the pathogenicity and transmissibility of MPXV.
    This study provides an understanding of the genomic variation of MPXV in the B.1 lineage linked to the recent outbreak of monkeypox. The identification of key mutations, particularly Class I variants, sheds light on the molecular mechanisms underlying the observed changes in the characteristics of circulating MPXV. Further studies can focus on functional domains affected by these mutations, enabling the development of effective control strategies against future monkeypox outbreaks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    2019年冠状病毒病(COVID-19)是一种严重的呼吸道疾病,由高度传染性的严重急性呼吸道综合症冠状病毒2(SARS-CoV-2)引起。随着COVID-19大流行的继续,SARS-CoV-2的突变积累。这些突变不仅可能使病毒传播更快,但也使目前的疫苗效果较差。在这项研究中,我们为使用GISAID分型方法定义的每个进化枝建立了参考序列。每个参考序列的同源性分析证实SARS-CoV-2的突变率较低,最新的进化枝与其他进化枝的同源性最低(99.89%-99.93%),其他进化枝之间的同源性大于或等于99.95%。变异分析表明,最早的基因型S,V,和G在基因组中分别具有2、3和3个特征突变。G衍生的进化枝GR,GH,和GV在基因组中分别具有5、6和13个特征突变。在最新进化枝的基因组中总共存在28个具有特征的突变。此外,我们发现不同分支的地理分布存在差异。G,GH,GR在美国很受欢迎,而GV和GRY在英国很常见。根据SARS-CoV-2的分子特征,我们的工作可能会促进抗病毒策略的定制设计。
    Coronavirus disease 2019 (COVID-19) is a severe respiratory disease caused by the highly infectious severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). As the COVID-19 pandemic continues, mutations of SARS-CoV-2 accumulate. These mutations may not only make the virus spread faster, but also render current vaccines less effective. In this study, we established a reference sequence for each clade defined using the GISAID typing method. Homology analysis of each reference sequence confirmed a low mutation rate for SARS-CoV-2, with the latest clade GRY having the lowest homology with other clades (99.89%-99.93%), and the homology between other clade being greater than or equal to 99.95%. Variation analyses showed that the earliest genotypes S, V, and G had 2, 3, and 3 characterizing mutations in the genome respectively. The G-derived clades GR, GH, and GV had 5, 6, and 13 characterizing mutations in the genome respectively. A total of 28 characterizing mutations existed in the genome of the latest clades GRY. In addition, we found differences in the geographic distribution of different clades. G, GH, and GR are popular in the USA, while GV and GRY are common in the UK. Our work may facilitate the custom design of antiviral strategies depending on the molecular characteristics of SARS-CoV-2.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:参考序列在下一代测序(NGS)中起着至关重要的作用,影响基因组分析过程中的作图质量。然而,由于地理差异和不同种群的独立人口事件,参考基因组通常不代表物种的全部遗传多样性。对于线粒体基因组(有丝分裂基因组),它在细胞中以高拷贝数出现,并且严格是母系遗传的,最佳参考序列有可能使有丝分裂基因组比对更准确和更有效。在这项研究中,我们使用了三种不同类型的参考序列进行有丝分裂基因组作图,即,常用参考序列(CU-ref),品种特异性参考序列(BS-ref)和样品特异性参考序列(SS-ref),分别,并比较了它们之间的丝裂原比对和SNP调用的准确性,为了提出特定种群线粒体DNA(mtDNA)分析的最佳参考序列,代表三个不同的品种,被高通量测序,随后将读段映射到上述参考序列,当将读段对齐到BS-ref时,导致最大的映射比和最深的覆盖而不增加运行时间。接下来,使用三种工具SAMTools通过18种检测策略进行单核苷酸多态性(SNP)调用,VarScan和GATK具有不同的参数,使用映射到BS-ref的bam结果。结果表明,所有18种策略都达到了相同的高特异性和敏感性,这表明,由于对SNP调用工具和参数选择的要求较低,因此BS-ref对有丝分裂细胞比对的准确性很高。
    结论:这项研究表明,代表与样品读数不同遗传关系的不同参考序列影响了有丝分裂基因组比对,品种特异性参考序列是丝裂原分析的最佳选择,它为NGS数据提供了一个精细的处理视角。
    BACKGROUND: Reference sequences play a vital role in next-generation sequencing (NGS), impacting mapping quality during genome analyses. However, reference genomes usually do not represent the full range of genetic diversity of a species as a result of geographical divergence and independent demographic events of different populations. For the mitochondrial genome (mitogenome), which occurs in high copy numbers in cells and is strictly maternally inherited, an optimal reference sequence has the potential to make mitogenome alignment both more accurate and more efficient. In this study, we used three different types of reference sequences for mitogenome mapping, i.e., the commonly used reference sequence (CU-ref), the breed-specific reference sequence (BS-ref) and the sample-specific reference sequence (SS-ref), respectively, and compared the accuracy of mitogenome alignment and SNP calling among them, for the purpose of proposing the optimal reference sequence for mitochondrial DNA (mtDNA) analyses of specific populations RESULTS: Four pigs, representing three different breeds, were high-throughput sequenced, subsequently mapping reads to the reference sequences mentioned above, resulting in a largest mapping ratio and a deepest coverage without increased running time when aligning reads to a BS-ref. Next, single nucleotide polymorphism (SNP) calling was carried out by 18 detection strategies with the three tools SAMtools, VarScan and GATK with different parameters, using the bam results mapping to BS-ref. The results showed that all eighteen strategies achieved the same high specificity and sensitivity, which suggested a high accuracy of mitogenome alignment by the BS-ref because of a low requirement for SNP calling tools and parameter choices.
    CONCLUSIONS: This study showed that different reference sequences representing different genetic relationships to sample reads influenced mitogenome alignment, with the breed-specific reference sequences being optimal for mitogenome analyses, which provides a refined processing perspective for NGS data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Potato (Solanum tuberosum L.) is one of the most important crops with a worldwide production of 370 million metric tons. The objectives of this study were (1) to create a high-quality consensus sequence across the two haplotypes of a diploid clone derived from a tetraploid elite variety and assess the sequence divergence from the available potato genome assemblies, as well as among the two haplotypes; (2) to evaluate the new assembly\'s usefulness for various genomic methods; and (3) to assess the performance of phasing in diploid and tetraploid clones, using linked-read sequencing technology. We used PacBio long reads coupled with 10x Genomics reads and proximity ligation scaffolding to create the dAg1_v1.0 reference genome sequence. With a final assembly size of 812 Mb, where 750 Mb are anchored to 12 chromosomes, our assembly is larger than other available potato reference sequences and high proportions of properly paired reads were observed for clones unrelated by pedigree to dAg1. Comparisons of the new dAg1_v1.0 sequence to other potato genome sequences point out the high divergence between the different potato varieties and illustrate the potential of using dAg1_v1.0 sequence in breeding applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    国际人类白细胞抗原(HLA)和免疫遗传学讲习班(IHIW)促进了HLA领域研究人员和专家的国际合作,组织相容性和免疫学。这些IHIW合作包括许多专注于实现各种具体目标的项目。这些项目的国际和协作性质需要收集和分析多个实验室产生的复杂数据,经常使用多种获取方法。以一致的方式收集和存储这些数据为IHIW项目增加了价值,可以扩展到未来的工作。基于DNA的基因分型数据,尤其是HLA基因分型数据,可以以组织免疫遗传学标记语言(HML)文档的形式传输。HML促进基因型和支持元数据的清晰通信,例如,测序平台,实验室化验,共有序列,和解释。可以相对于已知的参考序列报告序列信息,为基因型增加了意义和背景。为给定的等位基因序列选择正确的参考序列是微妙的,和指导方针已经通过协作社区的努力,如数据标准Hackathons出现。这里,我们描述了为选择第18次IHIW的HLA(和MICA/MICB)基因分型数据传输中使用的参考序列而建立的指南。
    The International human leukocyte antigen (HLA) and Immunogenetics Workshops (IHIWs) have fostered international collaborations of researchers and experts in the fields of HLA, histocompatibility and immunology. These IHIW collaborations have comprised many projects focused on achieving a variety of specific goals. The international and collaborative nature of these projects necessitates the collection and analysis of complex data generated in multiple laboratories, often using multiple methods of acquisition. Collection and storage of these data in a consistent way adds value to IHIW projects, which can be extended to future work. DNA-based genotyping data, especially HLA genotyping data, can be transmitted in the form of a Histoimmunogenetics Markup Language (HML) document. HML facilitates clear communication of a genotype and supporting metadata, such as, sequencing platform, laboratory assays, consensus sequence, and interpretation. Sequence information can be reported relative to known reference sequences, which add meaning and context to genotypes. Selecting the correct reference sequence for a given allele sequence is nuanced, and guidelines have emerged through collaborative community efforts such as Data Standards Hackathons. Here, we describe the guidelines established for the selection of reference sequences to be used in transmission of HLA (and MICA/MICB) genotyping data for the 18th IHIW.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:染色体变异在作物育种和遗传研究中起着重要作用。单链寡核苷酸(寡核苷酸)探针的开发简化了荧光原位杂交(FISH)的过程,并促进了许多物种的染色体鉴定。基因组测序为寡核苷酸探针的开发提供了丰富的资源。然而,由于缺乏有效的染色体标记,花生的研究进展甚微。直到现在,花生染色体变异的鉴定仍然是一个挑战。
    结果:基于从花生品种Tifrunner的参考序列中鉴定的全基因组串联重复序列(TRs),开发了总共114种新的寡核苷酸探针(AABB,2n=4x=40)和二倍体种花生(BB,2n=2x=20)。这些寡核苷酸探针根据它们在染色体中的位置和重叠信号分为28种类型。对于每种类型,选择代表性寡核苷酸并用绿色荧光素6-羧基荧光素(FAM)或红色荧光素6-羧基四甲基罗丹明(TAMRA)修饰。两杯鸡尾酒,通过汇集荧光团缀合的探针来开发多路复用#3和多路复用#4。多重#3包括FAM修饰的寡聚TIF-439、寡聚TIF-185-1、寡聚TIF-134-3和寡聚TIF-165。多重#4包括TAMRA修饰的寡聚物Ipa-1162、寡聚物Ipa-1137、寡聚物DP-1和寡聚物DP-5。在顺序FISH/基因组原位杂交(GISH)和计算机作图后,每种混合物都可以建立基于基因组图谱的核型。此外,我们确定了辐射照射诱导的花生的14种染色体变异。将总共28个代表性探针进一步染色体定位到新核型上。在探测器中,八个被绘制在次级收缩中,插入和末端区域;四个是B基因组特异性的;一个是染色体特异性的;其余15个被广泛定位在染色体的外围区域。
    结论:新型寡核苷酸探针的开发提供了一套有效的工具,可用于区分花生的各种染色体。通过FISH进行的物理定位揭示了花生染色体中重复寡核苷酸的基因组组织。建立了基于基因组图谱的核型,并将其用于与参考序列位置进行比较后的花生染色体变异的鉴定。
    BACKGROUND: Chromosomal variants play important roles in crop breeding and genetic research. The development of single-stranded oligonucleotide (oligo) probes simplifies the process of fluorescence in situ hybridization (FISH) and facilitates chromosomal identification in many species. Genome sequencing provides rich resources for the development of oligo probes. However, little progress has been made in peanut due to the lack of efficient chromosomal markers. Until now, the identification of chromosomal variants in peanut has remained a challenge.
    RESULTS: A total of 114 new oligo probes were developed based on the genome-wide tandem repeats (TRs) identified from the reference sequences of the peanut variety Tifrunner (AABB, 2n = 4x = 40) and the diploid species Arachis ipaensis (BB, 2n = 2x = 20). These oligo probes were classified into 28 types based on their positions and overlapping signals in chromosomes. For each type, a representative oligo was selected and modified with green fluorescein 6-carboxyfluorescein (FAM) or red fluorescein 6-carboxytetramethylrhodamine (TAMRA). Two cocktails, Multiplex #3 and Multiplex #4, were developed by pooling the fluorophore conjugated probes. Multiplex #3 included FAM-modified oligo TIF-439, oligo TIF-185-1, oligo TIF-134-3 and oligo TIF-165. Multiplex #4 included TAMRA-modified oligo Ipa-1162, oligo Ipa-1137, oligo DP-1 and oligo DP-5. Each cocktail enabled the establishment of a genome map-based karyotype after sequential FISH/genomic in situ hybridization (GISH) and in silico mapping. Furthermore, we identified 14 chromosomal variants of the peanut induced by radiation exposure. A total of 28 representative probes were further chromosomally mapped onto the new karyotype. Among the probes, eight were mapped in the secondary constrictions, intercalary and terminal regions; four were B genome-specific; one was chromosome-specific; and the remaining 15 were extensively mapped in the pericentric regions of the chromosomes.
    CONCLUSIONS: The development of new oligo probes provides an effective set of tools which can be used to distinguish the various chromosomes of the peanut. Physical mapping by FISH reveals the genomic organization of repetitive oligos in peanut chromosomes. A genome map-based karyotype was established and used for the identification of chromosome variations in peanut following comparisons with their reference sequence positions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Analysis of RNA by deep-sequencing approaches has found widespread application in modern biology. In addition to measurements of RNA abundance under various physiological conditions, such techniques are now widely used for mapping and quantification of RNA modifications. Transfer RNA (tRNA) molecules are among the frequent targets of such investigation, since they contain multiple modified residues. However, the major challenge in tRNA examination is related to a large number of duplicated and point-mutated genes encoding those RNA molecules. Moreover, the existence of multiple isoacceptors/isodecoders complicates both the analysis and read mapping. Existing databases for tRNA sequencing provide near exhaustive listings of tRNA genes, but the use of such highly redundant reference sequences in RNA-seq analyses leads to a large number of ambiguously mapped sequencing reads. Here we describe a relatively simple computational strategy for semi-automatic collapsing of highly redundant tRNA datasets into a non-redundant collection of reference tRNA sequences. The relevance of the approach was validated by analysis of experimentally obtained tRNA-sequencing datasets for different prokaryotic and eukaryotic model organisms. The data demonstrate that non-redundant tRNA reference sequences allow improving unambiguous mapping of deep sequencing data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Obtaining information about functional details of proteins of extinct species is of critical importance for a better understanding of the real-life appearance, behavior and ecology of these lost entries in the book of life. In this chapter, we discuss the possibilities to retrieve the necessary DNA sequence information from paleogenomic data obtained from fossil specimens, which can then be used to express and subsequently analyze the protein of interest. We discuss the problems specific to ancient DNA, including miscoding lesions, short read length and incomplete paleogenome assemblies. Finally, we discuss an alternative, but currently rarely used approach, direct PCR amplification, which is especially useful for comparatively short proteins.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    从2019年12月左右开始,肺炎的流行,被世界卫生组织命名为COVID-19,在武汉爆发,中国,并在世界各地蔓延。一种新的冠状病毒,国际病毒分类委员会冠状病毒研究小组将其命名为严重急性呼吸道综合症冠状病毒2(SARS-CoV-2),很快被发现是原因。目前,临床核酸检测的灵敏度有限,目前尚不清楚它是否与遗传变异有关。在这项研究中,我们从国家生物技术信息中心和GISAID数据库检索了95个SARAS-CoV-2菌株的全长基因组序列,通过进行多序列比对和系统发育分析建立参考序列,并分析了SARS-CoV-2基因组上的序列变异。所有病毒株之间的同源性普遍较高,其中,核苷酸水平为99.99%(99.91%-100%),氨基酸水平为99.99%(99.79%-100%)。尽管开放阅读框(ORF)区域的总体变化很低,1a中的13个变异位点,1b,S,3a,M,8,并确定了N个区域,其中ORF8中的nt28144位和ORF1a中的nt8782位突变率分别为30.53%(29/95)和29.47%(28/95),分别。这些发现表明,SARS-COV-2可能存在选择性突变,在设计引物和探针时必须避免某些区域。SARS-CoV-2参考序列的建立不仅有利于该病毒的生物学研究,而且有利于该病毒的诊断。未来SARS-CoV-2感染的临床监测和干预。
    Starting around December 2019, an epidemic of pneumonia, which was named COVID-19 by the World Health Organization, broke out in Wuhan, China, and is spreading throughout the world. A new coronavirus, named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) by the Coronavirus Study Group of the International Committee on Taxonomy of Viruses was soon found to be the cause. At present, the sensitivity of clinical nucleic acid detection is limited, and it is still unclear whether it is related to genetic variation. In this study, we retrieved 95 full-length genomic sequences of SARAS-CoV-2 strains from the National Center for Biotechnology Information and GISAID databases, established the reference sequence by conducting multiple sequence alignment and phylogenetic analyses, and analyzed sequence variations along the SARS-CoV-2 genome. The homology among all viral strains was generally high, among them, 99.99% (99.91%-100%) at the nucleotide level and 99.99% (99.79%-100%) at the amino acid level. Although overall variation in open-reading frame (ORF) regions is low, 13 variation sites in 1a, 1b, S, 3a, M, 8, and N regions were identified, among which positions nt28144 in ORF 8 and nt8782 in ORF 1a showed mutation rate of 30.53% (29/95) and 29.47% (28/95), respectively. These findings suggested that there may be selective mutations in SARS-COV-2, and it is necessary to avoid certain regions when designing primers and probes. Establishment of the reference sequence for SARS-CoV-2 could benefit not only biological study of this virus but also diagnosis, clinical monitoring and intervention of SARS-CoV-2 infection in the future.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Historical Article
    百慕大DNA序列数据共享原则是人类基因组计划(HGP)的持久遗产。HGP于1996年2月在百慕大的一次战略会议上通过了它们,并在1998年初以正式政策实施,要求每天将HGP资助的DNA序列释放到公共领域。每天分享的想法,我们争论,直接源于大企业的战略,目标导向的分子生物学项目首先在C.elegans研究人员的“社区”中进行测试,并由线虫生物学家JohnSulston和RobertWaterston介绍并为HGP辩护。在C.elegans社区,随后在HGP中,日常分享服务于质量控制和项目协调的务实目标。然而在HGP人类基因组中,我们还争辩说,百慕大原则解决了对基因专利阻碍科学进步的担忧,在执行和理由方面是有抱负和灵活的。他们忍受了如何实现和合理化快速数据共享的原型,并允许适应各种科学界的需求。然而除了Sulston和Waterston的支持,它们的采用还取决于美国国立卫生研究院(NIH)和英国非营利慈善机构WellcomeTrust的管理人员的影响力,它们共同资助了90%的HGP人类测序工作。希望留在HGP财团中的其他国家必须适应百慕大原则,要求德国公共资助研究的现有或未决数据访问政策不兼容的例外,Japan,和法国。我们从1963年开始这个故事,生物学家SydneyBrenner提出了剑桥大学分子生物学实验室(LMB)的线虫研究计划。我们一直持续到2003年,随着HGP人类参考基因组的完成,并以对政策和分子生物学史学的观察作为结尾。
    The Bermuda Principles for DNA sequence data sharing are an enduring legacy of the Human Genome Project (HGP). They were adopted by the HGP at a strategy meeting in Bermuda in February of 1996 and implemented in formal policies by early 1998, mandating daily release of HGP-funded DNA sequences into the public domain. The idea of daily sharing, we argue, emanated directly from strategies for large, goal-directed molecular biology projects first tested within the \"community\" of C. elegans researchers, and were introduced and defended for the HGP by the nematode biologists John Sulston and Robert Waterston. In the C. elegans community, and subsequently in the HGP, daily sharing served the pragmatic goals of quality control and project coordination. Yet in the HGP human genome, we also argue, the Bermuda Principles addressed concerns about gene patents impeding scientific advancement, and were aspirational and flexible in implementation and justification. They endured as an archetype for how rapid data sharing could be realized and rationalized, and permitted adaptation to the needs of various scientific communities. Yet in addition to the support of Sulston and Waterston, their adoption also depended on the clout of administrators at the US National Institutes of Health (NIH) and the UK nonprofit charity the Wellcome Trust, which together funded 90% of the HGP human sequencing effort. The other nations wishing to remain in the HGP consortium had to accommodate to the Bermuda Principles, requiring exceptions from incompatible existing or pending data access policies for publicly funded research in Germany, Japan, and France. We begin this story in 1963, with the biologist Sydney Brenner\'s proposal for a nematode research program at the Laboratory of Molecular Biology (LMB) at the University of Cambridge. We continue through 2003, with the completion of the HGP human reference genome, and conclude with observations about policy and the historiography of molecular biology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号