Reference Sequence

  • 文章类型: Journal Article
    In 2022, a global outbreak of monkeypox occurred with a significant shift in its epidemiological characteristics. The monkeypox virus (MPXV) belongs to the B.1 lineage, and its genomic variations that were linked to the outbreak were investigated in this study. Previous studies have suggested that viral genomic variation plays a crucial role in the pathogenicity and transmissibility of viruses. Therefore, understanding the genomic variation of MPXV is crucial for controlling future outbreaks.
    This study employed bioinformatics and phylogenetic approaches to evaluate the key genomic variation in the B.1 lineage of MPXV. A total of 979 MPXV strains were screened, and 212 representative strains were analyzed to identify specific substitutions in the viral genome. Reference sequences were constructed for each of the 10 lineages based on the most common nucleotide at each site. A total of 49 substitutions were identified, with 23 non-synonymous substitutions. Class I variants, which had significant effects on protein conformation likely to affect viral characteristics, were classified among the non-synonymous substitutions.
    The phylogenetic analysis revealed 10 relatively monophyletic branches. The study identified 49 substitutions specific to the B.1 lineage, with 23 non-synonymous substitutions that were classified into Class I, II, and III variants. The Class I variants were likely responsible for the observed changes in the characteristics of circulating MPXV in 2022. These key mutations, particularly Class I variants, played a crucial role in the pathogenicity and transmissibility of MPXV.
    This study provides an understanding of the genomic variation of MPXV in the B.1 lineage linked to the recent outbreak of monkeypox. The identification of key mutations, particularly Class I variants, sheds light on the molecular mechanisms underlying the observed changes in the characteristics of circulating MPXV. Further studies can focus on functional domains affected by these mutations, enabling the development of effective control strategies against future monkeypox outbreaks.






  • 文章类型: Journal Article
    Coronavirus disease 2019 (COVID-19) is a severe respiratory disease caused by the highly infectious severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). As the COVID-19 pandemic continues, mutations of SARS-CoV-2 accumulate. These mutations may not only make the virus spread faster, but also render current vaccines less effective. In this study, we established a reference sequence for each clade defined using the GISAID typing method. Homology analysis of each reference sequence confirmed a low mutation rate for SARS-CoV-2, with the latest clade GRY having the lowest homology with other clades (99.89%-99.93%), and the homology between other clade being greater than or equal to 99.95%. Variation analyses showed that the earliest genotypes S, V, and G had 2, 3, and 3 characterizing mutations in the genome respectively. The G-derived clades GR, GH, and GV had 5, 6, and 13 characterizing mutations in the genome respectively. A total of 28 characterizing mutations existed in the genome of the latest clades GRY. In addition, we found differences in the geographic distribution of different clades. G, GH, and GR are popular in the USA, while GV and GRY are common in the UK. Our work may facilitate the custom design of antiviral strategies depending on the molecular characteristics of SARS-CoV-2.






  • 文章类型: Journal Article
    BACKGROUND: Reference sequences play a vital role in next-generation sequencing (NGS), impacting mapping quality during genome analyses. However, reference genomes usually do not represent the full range of genetic diversity of a species as a result of geographical divergence and independent demographic events of different populations. For the mitochondrial genome (mitogenome), which occurs in high copy numbers in cells and is strictly maternally inherited, an optimal reference sequence has the potential to make mitogenome alignment both more accurate and more efficient. In this study, we used three different types of reference sequences for mitogenome mapping, i.e., the commonly used reference sequence (CU-ref), the breed-specific reference sequence (BS-ref) and the sample-specific reference sequence (SS-ref), respectively, and compared the accuracy of mitogenome alignment and SNP calling among them, for the purpose of proposing the optimal reference sequence for mitochondrial DNA (mtDNA) analyses of specific populations RESULTS: Four pigs, representing three different breeds, were high-throughput sequenced, subsequently mapping reads to the reference sequences mentioned above, resulting in a largest mapping ratio and a deepest coverage without increased running time when aligning reads to a BS-ref. Next, single nucleotide polymorphism (SNP) calling was carried out by 18 detection strategies with the three tools SAMtools, VarScan and GATK with different parameters, using the bam results mapping to BS-ref. The results showed that all eighteen strategies achieved the same high specificity and sensitivity, which suggested a high accuracy of mitogenome alignment by the BS-ref because of a low requirement for SNP calling tools and parameter choices.
    CONCLUSIONS: This study showed that different reference sequences representing different genetic relationships to sample reads influenced mitogenome alignment, with the breed-specific reference sequences being optimal for mitogenome analyses, which provides a refined processing perspective for NGS data.






  • 文章类型: Journal Article
    Potato (Solanum tuberosum L.) is one of the most important crops with a worldwide production of 370 million metric tons. The objectives of this study were (1) to create a high-quality consensus sequence across the two haplotypes of a diploid clone derived from a tetraploid elite variety and assess the sequence divergence from the available potato genome assemblies, as well as among the two haplotypes; (2) to evaluate the new assembly\'s usefulness for various genomic methods; and (3) to assess the performance of phasing in diploid and tetraploid clones, using linked-read sequencing technology. We used PacBio long reads coupled with 10x Genomics reads and proximity ligation scaffolding to create the dAg1_v1.0 reference genome sequence. With a final assembly size of 812 Mb, where 750 Mb are anchored to 12 chromosomes, our assembly is larger than other available potato reference sequences and high proportions of properly paired reads were observed for clones unrelated by pedigree to dAg1. Comparisons of the new dAg1_v1.0 sequence to other potato genome sequences point out the high divergence between the different potato varieties and illustrate the potential of using dAg1_v1.0 sequence in breeding applications.






  • 文章类型: Journal Article
    The International human leukocyte antigen (HLA) and Immunogenetics Workshops (IHIWs) have fostered international collaborations of researchers and experts in the fields of HLA, histocompatibility and immunology. These IHIW collaborations have comprised many projects focused on achieving a variety of specific goals. The international and collaborative nature of these projects necessitates the collection and analysis of complex data generated in multiple laboratories, often using multiple methods of acquisition. Collection and storage of these data in a consistent way adds value to IHIW projects, which can be extended to future work. DNA-based genotyping data, especially HLA genotyping data, can be transmitted in the form of a Histoimmunogenetics Markup Language (HML) document. HML facilitates clear communication of a genotype and supporting metadata, such as, sequencing platform, laboratory assays, consensus sequence, and interpretation. Sequence information can be reported relative to known reference sequences, which add meaning and context to genotypes. Selecting the correct reference sequence for a given allele sequence is nuanced, and guidelines have emerged through collaborative community efforts such as Data Standards Hackathons. Here, we describe the guidelines established for the selection of reference sequences to be used in transmission of HLA (and MICA/MICB) genotyping data for the 18th IHIW.







  • 文章类型: Journal Article
    BACKGROUND: Chromosomal variants play important roles in crop breeding and genetic research. The development of single-stranded oligonucleotide (oligo) probes simplifies the process of fluorescence in situ hybridization (FISH) and facilitates chromosomal identification in many species. Genome sequencing provides rich resources for the development of oligo probes. However, little progress has been made in peanut due to the lack of efficient chromosomal markers. Until now, the identification of chromosomal variants in peanut has remained a challenge.
    RESULTS: A total of 114 new oligo probes were developed based on the genome-wide tandem repeats (TRs) identified from the reference sequences of the peanut variety Tifrunner (AABB, 2n = 4x = 40) and the diploid species Arachis ipaensis (BB, 2n = 2x = 20). These oligo probes were classified into 28 types based on their positions and overlapping signals in chromosomes. For each type, a representative oligo was selected and modified with green fluorescein 6-carboxyfluorescein (FAM) or red fluorescein 6-carboxytetramethylrhodamine (TAMRA). Two cocktails, Multiplex #3 and Multiplex #4, were developed by pooling the fluorophore conjugated probes. Multiplex #3 included FAM-modified oligo TIF-439, oligo TIF-185-1, oligo TIF-134-3 and oligo TIF-165. Multiplex #4 included TAMRA-modified oligo Ipa-1162, oligo Ipa-1137, oligo DP-1 and oligo DP-5. Each cocktail enabled the establishment of a genome map-based karyotype after sequential FISH/genomic in situ hybridization (GISH) and in silico mapping. Furthermore, we identified 14 chromosomal variants of the peanut induced by radiation exposure. A total of 28 representative probes were further chromosomally mapped onto the new karyotype. Among the probes, eight were mapped in the secondary constrictions, intercalary and terminal regions; four were B genome-specific; one was chromosome-specific; and the remaining 15 were extensively mapped in the pericentric regions of the chromosomes.
    CONCLUSIONS: The development of new oligo probes provides an effective set of tools which can be used to distinguish the various chromosomes of the peanut. Physical mapping by FISH reveals the genomic organization of repetitive oligos in peanut chromosomes. A genome map-based karyotype was established and used for the identification of chromosome variations in peanut following comparisons with their reference sequence positions.







  • 文章类型: Journal Article
    Analysis of RNA by deep-sequencing approaches has found widespread application in modern biology. In addition to measurements of RNA abundance under various physiological conditions, such techniques are now widely used for mapping and quantification of RNA modifications. Transfer RNA (tRNA) molecules are among the frequent targets of such investigation, since they contain multiple modified residues. However, the major challenge in tRNA examination is related to a large number of duplicated and point-mutated genes encoding those RNA molecules. Moreover, the existence of multiple isoacceptors/isodecoders complicates both the analysis and read mapping. Existing databases for tRNA sequencing provide near exhaustive listings of tRNA genes, but the use of such highly redundant reference sequences in RNA-seq analyses leads to a large number of ambiguously mapped sequencing reads. Here we describe a relatively simple computational strategy for semi-automatic collapsing of highly redundant tRNA datasets into a non-redundant collection of reference tRNA sequences. The relevance of the approach was validated by analysis of experimentally obtained tRNA-sequencing datasets for different prokaryotic and eukaryotic model organisms. The data demonstrate that non-redundant tRNA reference sequences allow improving unambiguous mapping of deep sequencing data.







  • 文章类型: Journal Article
    Obtaining information about functional details of proteins of extinct species is of critical importance for a better understanding of the real-life appearance, behavior and ecology of these lost entries in the book of life. In this chapter, we discuss the possibilities to retrieve the necessary DNA sequence information from paleogenomic data obtained from fossil specimens, which can then be used to express and subsequently analyze the protein of interest. We discuss the problems specific to ancient DNA, including miscoding lesions, short read length and incomplete paleogenome assemblies. Finally, we discuss an alternative, but currently rarely used approach, direct PCR amplification, which is especially useful for comparatively short proteins.






  • 文章类型: Journal Article
    Starting around December 2019, an epidemic of pneumonia, which was named COVID-19 by the World Health Organization, broke out in Wuhan, China, and is spreading throughout the world. A new coronavirus, named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) by the Coronavirus Study Group of the International Committee on Taxonomy of Viruses was soon found to be the cause. At present, the sensitivity of clinical nucleic acid detection is limited, and it is still unclear whether it is related to genetic variation. In this study, we retrieved 95 full-length genomic sequences of SARAS-CoV-2 strains from the National Center for Biotechnology Information and GISAID databases, established the reference sequence by conducting multiple sequence alignment and phylogenetic analyses, and analyzed sequence variations along the SARS-CoV-2 genome. The homology among all viral strains was generally high, among them, 99.99% (99.91%-100%) at the nucleotide level and 99.99% (99.79%-100%) at the amino acid level. Although overall variation in open-reading frame (ORF) regions is low, 13 variation sites in 1a, 1b, S, 3a, M, 8, and N regions were identified, among which positions nt28144 in ORF 8 and nt8782 in ORF 1a showed mutation rate of 30.53% (29/95) and 29.47% (28/95), respectively. These findings suggested that there may be selective mutations in SARS-COV-2, and it is necessary to avoid certain regions when designing primers and probes. Establishment of the reference sequence for SARS-CoV-2 could benefit not only biological study of this virus but also diagnosis, clinical monitoring and intervention of SARS-CoV-2 infection in the future.







  • 文章类型: Historical Article
    The Bermuda Principles for DNA sequence data sharing are an enduring legacy of the Human Genome Project (HGP). They were adopted by the HGP at a strategy meeting in Bermuda in February of 1996 and implemented in formal policies by early 1998, mandating daily release of HGP-funded DNA sequences into the public domain. The idea of daily sharing, we argue, emanated directly from strategies for large, goal-directed molecular biology projects first tested within the \"community\" of C. elegans researchers, and were introduced and defended for the HGP by the nematode biologists John Sulston and Robert Waterston. In the C. elegans community, and subsequently in the HGP, daily sharing served the pragmatic goals of quality control and project coordination. Yet in the HGP human genome, we also argue, the Bermuda Principles addressed concerns about gene patents impeding scientific advancement, and were aspirational and flexible in implementation and justification. They endured as an archetype for how rapid data sharing could be realized and rationalized, and permitted adaptation to the needs of various scientific communities. Yet in addition to the support of Sulston and Waterston, their adoption also depended on the clout of administrators at the US National Institutes of Health (NIH) and the UK nonprofit charity the Wellcome Trust, which together funded 90% of the HGP human sequencing effort. The other nations wishing to remain in the HGP consortium had to accommodate to the Bermuda Principles, requiring exceptions from incompatible existing or pending data access policies for publicly funded research in Germany, Japan, and France. We begin this story in 1963, with the biologist Sydney Brenner\'s proposal for a nematode research program at the Laboratory of Molecular Biology (LMB) at the University of Cambridge. We continue through 2003, with the completion of the HGP human reference genome, and conclude with observations about policy and the historiography of molecular biology.






