Long reads

长读数
  • 文章类型: Journal Article
    准确的单倍型分析有助于区分等位基因特异性表达,识别顺式调节元素,表征基因组变异,这使得能够更精确地研究基因型和表型之间的关系。第三代单分子长读数和合成共条形码读数测序技术的最新进展已经利用远程信息来简化组装图并改善组装基因组序列。然而,由于长读数的高测序错误率和共条形码读数的有限捕获效率,重建完整单倍型在方法上仍然具有挑战性。我们在这里展示一条管道,AsmMix,用于生成连续和准确的二倍体基因组。它首先组装共同条形码读段,以生成可能包含许多缺口的准确的单倍型解析组装体,而长时间读取的程序集是连续的,但容易出错。然后将两个装配集集成到具有减少的误装配的单倍型解析的装配中。通过对多个合成数据集的广泛评估,AsmMix始终如一地在不同的测序平台上展示出高的单倍型准确率和召回率。覆盖深度,读取长度,读取准确性,显著优于该领域的其他现有工具。此外,我们使用人类全基因组数据集(HG002)验证了我们管道的有效性,并产生高度连续的,准确,和单倍型解析程序集。使用GIAB基准对这些程序集进行评估,确认变体调用的准确性。我们的结果表明,AsmMix提供了一种简单而高效的方法,可以有效地利用长读数和共条形码读数来进行单倍型解析组装。
    Accurate haplotyping facilitates distinguishing allele-specific expression, identifying cis-regulatory elements, and characterizing genomic variations, which enables more precise investigations into the relationship between genotype and phenotype. Recent advances in third-generation single-molecule long read and synthetic co-barcoded read sequencing techniques have harnessed long-range information to simplify the assembly graph and improve assembly genomic sequence. However, it remains methodologically challenging to reconstruct the complete haplotypes due to high sequencing error rates of long reads and limited capturing efficiency of co-barcoded reads. We here present a pipeline, AsmMix, for generating both contiguous and accurate diploid genomes. It first assembles co-barcoded reads to generate accurate haplotype-resolved assemblies that may contain many gaps, while the long-read assembly is contiguous but susceptible to errors. Then two assembly sets are integrated into haplotype-resolved assemblies with reduced misassembles. Through extensive evaluation on multiple synthetic datasets, AsmMix consistently demonstrates high precision and recall rates for haplotyping across diverse sequencing platforms, coverage depths, read lengths, and read accuracies, significantly outperforming other existing tools in the field. Furthermore, we validate the effectiveness of our pipeline using a human whole genome dataset (HG002), and produce highly contiguous, accurate, and haplotype-resolved assemblies. These assemblies are evaluated using the GIAB benchmarks, confirming the accuracy of variant calling. Our results demonstrate that AsmMix offers a straightforward yet highly efficient approach that effectively leverages both long reads and co-barcoded reads for haplotype-resolved assembly.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:尽管基因组解析宏基因组学的快速发展和宏基因组组装基因组(MAG)的显着爆炸,未培养的厌氧谱系的功能及其在碳矿化中的相互作用仍然很大程度上不确定,这对生物技术和生物地球化学有着深远的影响。
    结果:在这项研究中,我们将长读测序和超转录组学指导的代谢重建相结合,以提供厌氧生物反应器中碳矿化从聚合物到甲烷的全基因组视角。我们的结果表明,合并长读数导致宏基因组装配质量的实质性改善,实现132个高质量基因组的有效回收,这些基因组符合宏基因组组装基因组(MIMAG)最低信息的严格标准.此外,与短只读组装相比,杂交组装获得的原核基因多51%。超转录组学指导的代谢重建揭示了Mesotogasp的几种新型拟杆菌附属细菌和种群的显着代谢灵活性。清除氨基酸和糖。除了回收先前已知但分裂的共生细菌的两个环状基因组,发现Syntrophales中的两种新发现的细菌通过与主要的产甲烷菌Methanoculraceaebin的互养关系高度参与脂肪酸氧化。74和Methanothrixsp。bin.206。随着负载的增加,优选乙酸盐作为底物的bin.206的活性超过了bin.74的活性,加强底物的决定性作用。
    结论:总体而言,我们的研究发现了一些关键的活跃厌氧谱系及其在这个复杂的厌氧生态系统中的代谢功能,为理解厌氧消化中的碳转化提供了一个框架。这些发现促进了对代谢活动和营养相互作用的理解厌氧行会,提供对工程和自然生态系统中碳通量的基本见解。视频摘要。
    BACKGROUND: Despite rapid advances in genomic-resolved metagenomics and remarkable explosion of metagenome-assembled genomes (MAGs), the function of uncultivated anaerobic lineages and their interactions in carbon mineralization remain largely uncertain, which has profound implications in biotechnology and biogeochemistry.
    RESULTS: In this study, we combined long-read sequencing and metatranscriptomics-guided metabolic reconstruction to provide a genome-wide perspective of carbon mineralization flow from polymers to methane in an anaerobic bioreactor. Our results showed that incorporating long reads resulted in a substantial improvement in the quality of metagenomic assemblies, enabling the effective recovery of 132 high-quality genomes meeting stringent criteria of minimum information about a metagenome-assembled genome (MIMAG). In addition, hybrid assembly obtained 51% more prokaryotic genes in comparison to the short-read-only assembly. Metatranscriptomics-guided metabolic reconstruction unveiled the remarkable metabolic flexibility of several novel Bacteroidales-affiliated bacteria and populations from Mesotoga sp. in scavenging amino acids and sugars. In addition to recovering two circular genomes of previously known but fragmented syntrophic bacteria, two newly identified bacteria within Syntrophales were found to be highly engaged in fatty acid oxidation through syntrophic relationships with dominant methanogens Methanoregulaceae bin.74 and Methanothrix sp. bin.206. The activity of bin.206 preferring acetate as substrate exceeded that of bin.74 with increasing loading, reinforcing the substrate determinantal role.
    CONCLUSIONS: Overall, our study uncovered some key active anaerobic lineages and their metabolic functions in this complex anaerobic ecosystem, offering a framework for understanding carbon transformations in anaerobic digestion. These findings advance the understanding of metabolic activities and trophic interactions between anaerobic guilds, providing foundational insights into carbon flux within both engineered and natural ecosystems. Video Abstract.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    串联重复在整个人类基因组中频繁出现,重复长度的变化与多种性状有关。最近长读取测序技术的改进有可能极大地改善串联重复分析,尤其是长时间或复杂的重复。这里,我们介绍LongTR,从PacBio和OxfordNanoporeTechnologies提供的高保真长读数中准确地串联重复基因型。LongTR可在https://github.com/gymorek-lab/longtr和https://zenodo.org/doi/10.5281/zenodo.11403979上免费获得。
    Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    癌症发生通常涉及癌症基因组结构的重大改变,标记为难以用短读取测序捕获的大的结构和拷贝数变异(SV和CNV)。传统上,细胞遗传学技术被用来检测这种异常,但是它们的分辨率有限,不包括小于几百千碱基的功能。光学基因组作图和纳米孔测序是有吸引力的技术,可以弥合这种分辨率差距,并为细胞遗传学应用提供增强的性能。这些方法描述原生,单个DNA分子,从而捕获表观遗传信息。我们应用这两种技术来表征透明细胞肾细胞癌(ccRCC)肿瘤的结构和拷贝数景观,在变体大小和平均读取长度的上下文中突出显示每种方法的相对强度。此外,我们评估了它们在甲基化和羟甲基化方面的效用,强调表观遗传分析适用性的差异。
    Carcinogenesis often involves significant alterations in the cancer genome architecture, marked by large structural and copy number variations (SVs and CNVs) that are difficult to capture with short-read sequencing. Traditionally, cytogenetic techniques are applied to detect such aberrations, but they are limited in resolution and do not cover features smaller than several hundred kilobases. Optical genome mapping and nanopore sequencing are attractive technologies that bridge this resolution gap and offer enhanced performance for cytogenetic applications. These methods profile native, individual DNA molecules, thus capturing epigenetic information. We applied both techniques to characterize a clear cell renal cell carcinoma (ccRCC) tumor\'s structural and copy number landscape, highlighting the relative strengths of each method in the context of variant size and average read length. Additionally, we assessed their utility for methylome and hydroxymethylome profiling, emphasizing differences in epigenetic analysis applicability.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    端粒是线性染色体末端的重复DNA区域,可保护染色体末端免受降解。端粒长度在衰老和疾病的背景下得到了广泛的研究,尽管大多数研究使用平均端粒长度,但效果有限。我们提出了一种从长读取测序数据中鉴定所有92个端粒等位基因的方法。使用端粒区域附近的变异重复序列鉴定个体端粒,在等位基因中是独一无二的。端粒的这种高通量和高分辨率表征可能是未来研究特定端粒在衰老和疾病中的作用的基础。
    Telomeres are regions of repetitive DNA at the ends of linear chromosomes which protect chromosome ends from degradation. Telomere lengths have been extensively studied in the context of aging and disease, though most studies use average telomere lengths which are of limited utility. We present a method for identifying all 92 telomere alleles from long read sequencing data. Individual telomeres are identified using variant repeats proximal to telomere regions, which are unique across alleles. This high-throughput and high-resolution characterization of telomeres could be foundational to future studies investigating the roles of specific telomeres in aging and disease.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    长读测序数据,特别是来自牛津纳米孔测序平台的那些,往往表现出较高的错误率。这里,我们介绍NextDenovo,一个有效的纠错和组装工具,用于嘈杂的长时间读取,这在基因组组装中实现了高水平的准确性。我们应用NextDenovo使用Nanopore长读数据组装来自世界各地的35个不同的人类基因组。这些基因组使我们能够识别现代人群中片段复制和基因拷贝数变异的景观。NextDenovo的使用应该为使用Nanopore长读数据的群体规模长读组装铺平道路。
    Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    癌症是由许多基因组畸变引起的多方面疾病,所述基因组畸变已被鉴定为测序技术进步的结果。而下一代测序(NGS),它使用短读,改变了癌症研究和诊断,它受读取长度的限制。第三代测序(TGS),由太平洋生物科学和牛津纳米孔技术平台领导,采用长读序列,这标志着癌症研究的范式转变。癌症基因组通常包含复杂的事件,和TGS,具有跨越大型基因组区域的能力,促进了他们的表征,提供了一个更好的理解复杂的重排如何影响癌症的开始和进展。TGS还表征了各种癌症的整个转录组,揭示可作为生物标志物或治疗靶标的癌症相关亚型。此外,TGS通过改进基因组组装来推进癌症研究,检测复杂变异,并提供更完整的转录组和表观基因组。本文综述了TGS及其在癌症研究中日益增长的作用。我们研究了它的优点和局限性,提供严格的科学分析,用于检测NGS错过的先前隐藏的像差。这项有前途的技术在研究和临床应用方面都具有巨大的潜力,对癌症的诊断和治疗具有深远的意义。
    Cancer is a multifaceted disease arising from numerous genomic aberrations that have been identified as a result of advancements in sequencing technologies. While next-generation sequencing (NGS), which uses short reads, has transformed cancer research and diagnostics, it is limited by read length. Third-generation sequencing (TGS), led by the Pacific Biosciences and Oxford Nanopore Technologies platforms, employs long-read sequences, which have marked a paradigm shift in cancer research. Cancer genomes often harbour complex events, and TGS, with its ability to span large genomic regions, has facilitated their characterisation, providing a better understanding of how complex rearrangements affect cancer initiation and progression. TGS has also characterised the entire transcriptome of various cancers, revealing cancer-associated isoforms that could serve as biomarkers or therapeutic targets. Furthermore, TGS has advanced cancer research by improving genome assemblies, detecting complex variants, and providing a more complete picture of transcriptomes and epigenomes. This review focuses on TGS and its growing role in cancer research. We investigate its advantages and limitations, providing a rigorous scientific analysis of its use in detecting previously hidden aberrations missed by NGS. This promising technology holds immense potential for both research and clinical applications, with far-reaching implications for cancer diagnosis and treatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    从头组装在宏基因组分析中起着关键作用,第三代测序技术的掺入可以显著提高组装结果的完整性和准确性。最近,随着测序技术的进步(Hi-Fi,超长),已经开发了几种基于长期阅读的生物信息学工具。然而,这些工具的性能和可靠性的验证是一个至关重要的问题。为了解决这个差距,我们介绍了MCSS(基于结构的微生物群落模拟器),它能够根据真实微生物群落的结构属性生成模拟微生物群落和测序数据集。评估结果表明,它可以生成与实际社区结构同时表现出多样性和相似性的模拟社区。此外,MCSS为模拟群落中的物种生成合成的PacBioHi-Fi和牛津纳米孔技术(ONT)长读数。这种创新工具为基准测试和完善宏基因组分析方法提供了宝贵的资源。代码可在以下网址获得:https://github.com/panlab-bio/mcss。
    De novo assembly plays a pivotal role in metagenomic analysis, and the incorporation of third-generation sequencing technology can significantly improve the integrity and accuracy of assembly results. Recently, with advancements in sequencing technology (Hi-Fi, ultra-long), several long-read-based bioinformatic tools have been developed. However, the validation of the performance and reliability of these tools is a crucial concern. To address this gap, we present MCSS (microbial community simulator based on structure), which has the capability to generate simulated microbial community and sequencing datasets based on the structure attributes of real microbiome communities. The evaluation results indicate that it can generate simulated communities that exhibit both diversity and similarity to actual community structures. Additionally, MCSS generates synthetic PacBio Hi-Fi and Oxford Nanopore Technologies (ONT) long reads for the species within the simulated community. This innovative tool provides a valuable resource for benchmarking and refining metagenomic analysis methods. Code available at: https://github.com/panlab-bio/mcss.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    作为一种电泳寄生虫和病毒载体,螨虫Varroa破坏因子和相关的变形机翼病毒(DWV)形成了对蜜蜂的致命组合,蜜蜂。据报道,常规杀螨剂治疗可减少螨虫的多样性,并选择对这些治疗的耐受性。Further,当通过螨虫传播时,不同的DWV应变面临选择性压力。在这项研究中,使用长读段对瓦螨和相关DWV变异体的单倍型进行了定量.单个单倍型主导螨线粒体基因细胞色素氧化酶亚基I,反映了一个古老的瓶颈。然而,高度多态性的基因存在于整个螨基因组中,表明可以在区域一级积极维持螨虫的多样性。在螨虫和蜜蜂中检测到的DWV均显示出显性变异,只有少数低频备用单倍型。从蜜蜂和螨虫中分离出的DWV单倍型的相对丰度高度一致,这表明一些变体受到持续选择的青睐。
    As a phoretic parasite and virus vector, the mite Varroa destructor and the associated Deformed wing virus (DWV) form a lethal combination to the honey bee, Apis mellifera. Routine acaricide treatment has been reported to reduce the diversity of mites and select for tolerance against these treatments. Further, different DWV strains face selective pressures when transmitted via mites. In this study, the haplotypes of Varroa mites and associated DWV variants were quantified using long reads. A single haplotype dominated the mite mitochondrial gene cytochrome oxidase subunit I, reflecting an ancient bottleneck. However, highly polymorphic genes were present across the mite genome, suggesting the diversity of mites could be actively maintained at a regional level. DWV detected in both mites and honey bees show a dominant variant with only a few low-frequency alternate haplotypes. The relative abundances of DWV haplotypes isolated from honey bees and mites were highly consistent, suggesting that some variants are favored by ongoing selection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Preprint
    背景:长读测序技术正在成为基因组和转录组学分析中日益不可或缺的工具。特别是在转录组学中,长读段提供了对全长同工型进行测序的可能性,这可以大大简化新转录本的鉴定和转录本的定量。然而,尽管有这个承诺,迄今为止,长期阅读方法开发的重点一直放在成绩单识别上,对量化的关注相对较少。然而,由于底层协议和技术的差异,较低的吞吐量(即与短读取技术相比,每个样品测序的读数较少),以及技术文物,长读量化仍然是一个挑战,激励继续开发和评估针对这种日益普遍的数据类型定制的量化方法。
    结果:我们引入了一种用于长读数转录本定量的新方法和软件工具,称为oarfish。我们的模型包含了一个新颖而创新的覆盖分数,这会影响底层概率模型中片段分配的条件概率。我们证明,通过考虑这些覆盖信息,Oarfish能够产生比现有的长读取量化方法更准确的量化估计,特别是当人们考虑存在于特定细胞系或组织类型中的主要同工型时。
    方法:Oarfish是用Rust编程语言实现的,并在BSD3条款许可下作为免费和开源软件提供。源代码可在https://www上获得。github.com/COMBINE-lab/oarfish.
    UNASSIGNED: Long read sequencing technology is becoming an increasingly indispensable tool in genomic and transcriptomic analysis. In transcriptomics in particular, long reads offer the possibility of sequencing full-length isoforms, which can vastly simplify the identification of novel transcripts and transcript quantification. However, despite this promise, the focus of much long read method development to date has been on transcript identification, with comparatively little attention paid to quantification. Yet, due to differences in the underlying protocols and technologies, lower throughput (i.e. fewer reads sequenced per sample compared to short read technologies), as well as technical artifacts, long read quantification remains a challenge, motivating the continued development and assessment of quantification methods tailored to this increasingly prevalent type of data.
    UNASSIGNED: We introduce a new method and software tool for long read transcript quantification called oarfish. Our model incorporates a novel and innovative coverage score, which affects the conditional probability of fragment assignment in the underlying probabilistic model. We demonstrate that by accounting for this coverage information, oarfish is able to produce more accurate quantification estimates than existing long read quantification methods, particularly when one considers the primary isoforms present in a particular cell line or tissue type.
    UNASSIGNED: Oarfish is implemented in the Rust programming language, and is made available as free and open-source software under the BSD 3-clause license. The source code is available at https://www.github.com/COMBINE-lab/oarfish.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号