short read

  • 文章类型: Journal Article
    IlluminaHiSeq的配对短读,MiSeq,和NovaSeq的模拟细菌群落来自新鲜菠菜和地表水在不同测序深度的计算机上产生。多药耐药的肠道沙门氏菌血清型印第安纳州被纳入菠菜社区,而水体中含有多重耐药的铜绿假单胞菌。
    Paired-end short reads of Illumina HiSeq, MiSeq, and NovaSeq of simulated bacterial communities from fresh spinach and surface water were generated in silico at various sequencing depths. Multidrug-resistant Salmonella enterica serotype Indiana was included in the spinach community, while the water community contained multidrug-resistant Pseudomonas aeruginosa.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    自然存在的杆状病毒分离株,如家蚕核型多角体病毒(BmNPV),通常由许多遗传上不同的单倍型组成。破译这些分离株的不同单倍型受到dsDNA基因组的大尺寸的阻碍。以及广泛用于杆状病毒分离物表征的下一代测序(NGS)技术的短读取长度。在这项研究中,我们通过将NGS确定单核苷酸变体(SNV)作为遗传标记的准确性与Nanopore测序技术的长读取长度相结合,解决了这一挑战.这种混合方法允许对BmNPV的遗传同质和异质分离株进行综合分析。具体来说,这允许通过SNV位置连锁在异质分离株BmNPV-Ja中鉴定两个推定的主要单倍型。SNV位置,这些数据是根据NGS数据确定的,通过位置权重矩阵中的长纳米孔读数链接。使用改进的期望最大化算法,通过机器学习根据可变SNV位置的出现来分配纳米孔读数。阅读的队列是从头组装的,这导致了BmNPV单倍型的鉴定。该方法证明了短读测序技术和长读测序技术相结合的方法在破译杆状病毒分离株遗传多样性方面的优势。
    Naturally occurring isolates of baculoviruses, such as the Bombyx mori nucleopolyhedrovirus (BmNPV), usually consist of numerous genetically different haplotypes. Deciphering the different haplotypes of such isolates is hampered by the large size of the dsDNA genome, as well as the short read length of next generation sequencing (NGS) techniques that are widely applied for baculovirus isolate characterization. In this study, we addressed this challenge by combining the accuracy of NGS to determine single nucleotide variants (SNVs) as genetic markers with the long read length of Nanopore sequencing technique. This hybrid approach allowed the comprehensive analysis of genetically homogeneous and heterogeneous isolates of BmNPV. Specifically, this allowed the identification of two putative major haplotypes in the heterogeneous isolate BmNPV-Ja by SNV position linkage. SNV positions, which were determined based on NGS data, were linked by the long Nanopore reads in a Position Weight Matrix. Using a modified Expectation-Maximization algorithm, the Nanopore reads were assigned according to the occurrence of variable SNV positions by machine learning. The cohorts of reads were de novo assembled, which led to the identification of BmNPV haplotypes. The method demonstrated the strength of the combined approach of short- and long-read sequencing techniques to decipher the genetic diversity of baculovirus isolates.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    消化是由消化酶驱动的,消化酶基因拷贝数可以为饮食专业化的基因组基础提供见解。“适应性调节假说”(AMH)提出消化酶活性,随着基因拷贝数的增加,应与饮食中的底物数量相关。为了测试AMH并揭示食草动物与食肉动物的一些遗传学,我们测序了,组装,并注释了假人的基因组,Stichaeidae家族中的一种食肉刺头鱼,并比较了关键消化酶的基因拷贝数和仙人掌的基因拷贝数,来自同一个家庭的食草鱼。一个高度连续的高质量的基因组组装(N50=10.6Mb)产生的紫菜,使用组合的长读和短读技术,估计有33,842个蛋白质编码基因。我们检查的消化酶包括胰腺α-淀粉酶,羧酸酯脂肪酶,丙氨酰氨基肽酶,胰蛋白酶,还有胰凝乳蛋白酶.与紫罗兰梭菌相比,假肢的胰腺α-淀粉酶(碳水化合物消化)拷贝较少(1vs.3copies).此外,A.pururescens的羧基酯脂肪酶(植物脂质消化)比C.violaceus少一个拷贝(4vs.5).与紫罗兰梭菌相比,我们观察到紫罗兰梭菌中几种蛋白质消化基因的拷贝数增加,包括胰蛋白酶(5vs.3)和总氨肽酶(6vs.5).总的来说,这些基因组差异与两个物种中测量的消化酶活性(表型)一致,它们支持AMH。此外,这种基因组资源现在可以更好地了解鱼类生物学和饮食专业化。
    Digestion is driven by digestive enzymes and digestive enzyme gene copy number can provide insights on the genomic underpinnings of dietary specialization. The \"Adaptive Modulation Hypothesis\" (AMH) proposes that digestive enzyme activity, which increases with increased gene copy number, should correlate with substrate quantity in the diet. To test the AMH and reveal some of the genetics of herbivory vs carnivory, we sequenced, assembled, and annotated the genome of Anoplarchus purpurescens, a carnivorous prickleback fish in the family Stichaeidae, and compared the gene copy number for key digestive enzymes to that of Cebidichthys violaceus, a herbivorous fish from the same family. A highly contiguous genome assembly of high quality (N50 = 10.6 Mb) was produced for A. purpurescens, using combined long-read and short-read technology, with an estimated 33,842 protein-coding genes. The digestive enzymes that we examined include pancreatic α-amylase, carboxyl ester lipase, alanyl aminopeptidase, trypsin, and chymotrypsin. Anoplarchus purpurescens had fewer copies of pancreatic α-amylase (carbohydrate digestion) than C. violaceus (1 vs. 3 copies). Moreover, A. purpurescens had one fewer copy of carboxyl ester lipase (plant lipid digestion) than C. violaceus (4 vs. 5). We observed an expansion in copy number for several protein digestion genes in A. purpurescens compared to C. violaceus, including trypsin (5 vs. 3) and total aminopeptidases (6 vs. 5). Collectively, these genomic differences coincide with measured digestive enzyme activities (phenotypes) in the two species and they support the AMH. Moreover, this genomic resource is now available to better understand fish biology and dietary specialization.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基因组结构变异(SV)影响不同生物的遗传和表型特征,但是缺乏可靠的SV检测方法阻碍了遗传分析。我们开发了一种计算算法(MOPline),其中包括缺失呼叫恢复与高置信度SV呼叫选择和使用短读取全基因组测序(WGS)数据的基因分型相结合。使用3,672个高覆盖WGS数据集,MOPline稳定检测到~每个个体16,000SV,比以前的大型项目高出1.7-3.3倍,同时表现出可比的统计质量指标水平。我们从181,622名日本人的SVs中估算了42种疾病和60种数量性状。与估算的SV进行的全基因组关联研究显示,有41个排名最高或接近排名最高的全基因组重要SV,包括8个外显子SV,具有5个新颖的关联和丰富的移动元素插入。这项研究表明,短读WGS数据可用于鉴定与多种性状相关的罕见和常见SV。
    Genomic structural variation (SV) affects genetic and phenotypic characteristics in diverse organisms, but the lack of reliable methods to detect SV has hindered genetic analysis. We developed a computational algorithm (MOPline) that includes missing call recovery combined with high-confidence SV call selection and genotyping using short-read whole-genome sequencing (WGS) data. Using 3,672 high-coverage WGS datasets, MOPline stably detected ∼16,000 SVs per individual, which is over ∼1.7-3.3-fold higher than previous large-scale projects while exhibiting a comparable level of statistical quality metrics. We imputed SVs from 181,622 Japanese individuals for 42 diseases and 60 quantitative traits. A genome-wide association study with the imputed SVs revealed 41 top-ranked or nearly top-ranked genome-wide significant SVs, including 8 exonic SVs with 5 novel associations and enriched mobile element insertions. This study demonstrates that short-read WGS data can be used to identify rare and common SVs associated with a variety of traits.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Video-Audio Media
    扩增子测序是用于分析微生物组的已建立且具有成本效益的方法。然而,许多可用的工具来处理这些数据需要生物信息学技能和高计算能力来处理大数据集。此外,只有很少的工具,允许长读扩增子数据分析。为了弥合这个差距,我们开发了LotuS2(更少的OTU脚本2)管道,启用用户友好,资源友好,和原始扩增子序列的通用分析。
    在LotuS2中,六种不同的序列聚类算法以及广泛的预处理和后处理选项允许两位专家进行灵活的数据分析,其中参数可以完全调整,和新手,其中为不同的场景提供默认值。我们对三个独立的肠道和土壤数据集进行了基准测试,其中LotuS2平均比其他管道快29倍,还可以更好地再现技术复制样本的α-和β-多样性。进一步对具有已知分类单元组成的模拟社区进行基准测试表明,与其他管道相比,LotuS2回收了较高比例的正确识别分类单元和较高比例的分配给真实分类单元的读数(物种分别为48%和57%;属水平为83%和98%,分别)。在ASV/OTU级别,LotuS2的精确度和F评分最高,正确报告的16S序列的分数也是如此.
    LotuS2是一个轻量级和用户友好的管道,速度快,精确,流线型,使用广泛的前和后ASV/OTU聚类步骤来进一步提高数据质量。高数据使用率和可靠性可在几分钟内实现高通量微生物组分析。
    LotuS2可从GitHub获得,康达,或者通过银河网络界面,记录在http://lotus2。earlham.AC.英国/。视频摘要。
    Amplicon sequencing is an established and cost-efficient method for profiling microbiomes. However, many available tools to process this data require both bioinformatics skills and high computational power to process big datasets. Furthermore, there are only few tools that allow for long read amplicon data analysis. To bridge this gap, we developed the LotuS2 (less OTU scripts 2) pipeline, enabling user-friendly, resource friendly, and versatile analysis of raw amplicon sequences.
    In LotuS2, six different sequence clustering algorithms as well as extensive pre- and post-processing options allow for flexible data analysis by both experts, where parameters can be fully adjusted, and novices, where defaults are provided for different scenarios. We benchmarked three independent gut and soil datasets, where LotuS2 was on average 29 times faster compared to other pipelines, yet could better reproduce the alpha- and beta-diversity of technical replicate samples. Further benchmarking a mock community with known taxon composition showed that, compared to the other pipelines, LotuS2 recovered a higher fraction of correctly identified taxa and a higher fraction of reads assigned to true taxa (48% and 57% at species; 83% and 98% at genus level, respectively). At ASV/OTU level, precision and F-score were highest for LotuS2, as was the fraction of correctly reported 16S sequences.
    LotuS2 is a lightweight and user-friendly pipeline that is fast, precise, and streamlined, using extensive pre- and post-ASV/OTU clustering steps to further increase data quality. High data usage rates and reliability enable high-throughput microbiome analysis in minutes.
    LotuS2 is available from GitHub, conda, or via a Galaxy web interface, documented at http://lotus2.earlham.ac.uk/ . Video Abstract.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    作为一种常见的结构变异,插入是指将DNA序列添加到个体基因组中,通常与一些遗传性疾病有关。近年来,已经提出了许多方法来检测插入。然而,插入的准确调用也是一项具有挑战性的任务。在这项研究中,我们提出了一种新的基于软剪切读取的插入检测方法,这叫做SIns。首先,基于配对读段和参考基因组之间的比对,SIns从软剪切读段中提取断点并确定插入位置。然后将有关配对读段的插入大小信息进一步聚类以确定基因型,SIns随后采用Minia来组装插入序列。实验结果表明,就模拟和真实数据集的F得分而言,SIns可以比其他方法获得更好的性能。
    As a common type of structural variation, an insertion refers to the addition of a DNA sequence into an individual genome and is usually associated with some inherited diseases. In recent years, many methods have been proposed for detecting insertions. However, the accurate calling of insertions is also a challenging task. In this study, we propose a novel insertion detection approach based on soft-clipped reads, which is called SIns. First, based on the alignments between paired reads and the reference genome, SIns extracts breakpoints from soft-clipped reads and determines insertion locations. The insert size information about paired reads is then further clustered to determine the genotype, and SIns subsequently adopts Minia to assemble the insertion sequences. Experimental results show that SIns can achieve better performance than other methods in terms of the F-score value for simulated and true datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Rhesus macaque is one of the most widely used primate model animals for immunological research of infectious diseases including human immunodeficiency virus (HIV) infection. It is well known that major histocompatibility complex (MHC) class I genotypes affect the susceptibility and disease progression to simian immunodeficiency virus (SIV) in rhesus macaques, which is resembling to HIV in humans. It is required to convincingly determine the MHC genotypes in the immunological investigations, that is why several next-generation sequencing (NGS)-based methods have been established. In general, NGS-based genotyping methods using short amplicons are not often applied to MHC because of increasing number of alleles and inevitable ambiguity in allele detection, although there is an advantage of short read sequencing systems that are commonly used today. In this study, we developed a new high-throughput NGS-based genotyping method for MHC class I alleles in rhesus macaques and cynomolgus macaques. By using our method, 95% and 100% of alleles identified by PCR cloning-based method were detected in rhesus macaques and cynomolgus macaques, respectively, which were highly correlated with their expression levels. It was noted that the simulation of new-allele detection step using artificial alleles differing by a few nucleotide sequences from a known allele could be identified with high accuracy and that we could detect a real novel allele from a rhesus macaque sample. These findings supported that our method could be adapted for primate animal models such as macaques to reduce the cost and labor of previous NGS-based MHC genotyping.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    下一代测序技术对许多生物学科都非常重要;然而,由于技术和生物限制,由现代测序仪产生的短DNA序列需要许多质量控制(QC)措施来减少错误,去除技术污染物,或将配对末端读取合并到更长或更高质量的重叠群中。每个步骤都有许多工具,但是选择适当的方法和使用参数可能具有挑战性,因为每个步骤的参数化取决于所使用的测序技术的特殊性,被分析的样本类型,以及仪器和样品制备的随机性。此外,最终用户可能不知道有关其数据如何生成的所有相关信息,例如用于做出明智选择的配对末端序列或衔接子类型的预期重叠。这种日益增加的复杂性和细微差别需要一个管道,以用户友好的方式将现有步骤组合在一起,如果可能,从数据中自动学习合理的质量参数。我们提出了一个用户友好的质量控制管道,称为SHI7(规范发音为“shizen”),旨在通过预测常见测序衔接子的存在和/或类型,为最终用户简化短读数据的质量控制,要修剪什么质量分数,数据集是鸟枪还是扩增子测序,读段是双端还是单端,以及双是否可缝合,包括预期的配对重叠量。我们希望SHI7将使所有研究人员更容易,专家和新手一样,遵循合理的短读数据质量控制实践。重要性高通量DNA测序数据的质量控制是一项重要但有时费力的任务,需要所使用的测序协议的背景知识(例如衔接子类型,测序技术,插入尺寸/可缝合性,配对,等。).质量控制方案通常需要应用这种背景知识来选择和执行具有适当参数的许多质量控制步骤。这在处理公共数据或使用不同协议的协作者的数据时尤其困难。我们创建了一个简化的质量控制管道,旨在大大简化从原始机器输出文件到可操作序列数据的DNA质量控制过程。与其他方法相比,我们建议的管道易于安装和使用,并尝试使用单个命令自动从数据中学习必要的参数。
    Next-generation sequencing technology is of great importance for many biological disciplines; however, due to technical and biological limitations, the short DNA sequences produced by modern sequencers require numerous quality control (QC) measures to reduce errors, remove technical contaminants, or merge paired-end reads together into longer or higher-quality contigs. Many tools for each step exist, but choosing the appropriate methods and usage parameters can be challenging because the parameterization of each step depends on the particularities of the sequencing technology used, the type of samples being analyzed, and the stochasticity of the instrumentation and sample preparation. Furthermore, end users may not know all of the relevant information about how their data were generated, such as the expected overlap for paired-end sequences or type of adaptors used to make informed choices. This increasing complexity and nuance demand a pipeline that combines existing steps together in a user-friendly way and, when possible, learns reasonable quality parameters from the data automatically. We propose a user-friendly quality control pipeline called SHI7 (canonically pronounced \"shizen\"), which aims to simplify quality control of short-read data for the end user by predicting presence and/or type of common sequencing adaptors, what quality scores to trim, whether the data set is shotgun or amplicon sequencing, whether reads are paired end or single end, and whether pairs are stitchable, including the expected amount of pair overlap. We hope that SHI7 will make it easier for all researchers, expert and novice alike, to follow reasonable practices for short-read data quality control. IMPORTANCE Quality control of high-throughput DNA sequencing data is an important but sometimes laborious task requiring background knowledge of the sequencing protocol used (such as adaptor type, sequencing technology, insert size/stitchability, paired-endedness, etc.). Quality control protocols typically require applying this background knowledge to selecting and executing numerous quality control steps with the appropriate parameters, which is especially difficult when working with public data or data from collaborators who use different protocols. We have created a streamlined quality control pipeline intended to substantially simplify the process of DNA quality control from raw machine output files to actionable sequence data. In contrast to other methods, our proposed pipeline is easy to install and use and attempts to learn the necessary parameters from the data automatically with a single command.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    BACKGROUND: Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE polymorphisms, particularly transposition events or non-reference insertion sites can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools.
    METHODS: We have developed the tool RelocaTE2 for identification of TE insertion sites at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision.
    CONCLUSIONS: The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrate high level of sensitivity and specificity, particularly when the sequence coverage is not shallow. In comparison to other tools tested, RelocaTE2 achieves the best balance between sensitivity and specificity. In particular, RelocaTE2 performs best in prediction of TSDs for TE insertions. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE insertion sites and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    The name Alview is a contraction of the term Alignment Viewer. Alview is a compiled to native architecture software tool for visualizing the alignment of sequencing data. Inputs are files of short-read sequences aligned to a reference genome in the SAM/BAM format and files containing reference genome data. Outputs are visualizations of these aligned short reads. Alview is written in portable C with optional graphical user interface (GUI) code written in C, C++, and Objective-C. The application can run in three different ways: as a web server, as a command line tool, or as a native, GUI program. Alview is compatible with Microsoft Windows, Linux, and Apple OS X. It is available as a web demo at https://cgwb.nci.nih.gov/cgi-bin/alview. The source code and Windows/Mac/Linux executables are available via https://github.com/NCIP/alview.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号