Long reads

长读数
  • 文章类型: Journal Article
    准确的单倍型分析有助于区分等位基因特异性表达,识别顺式调节元素,表征基因组变异,这使得能够更精确地研究基因型和表型之间的关系。第三代单分子长读数和合成共条形码读数测序技术的最新进展已经利用远程信息来简化组装图并改善组装基因组序列。然而,由于长读数的高测序错误率和共条形码读数的有限捕获效率,重建完整单倍型在方法上仍然具有挑战性。我们在这里展示一条管道,AsmMix,用于生成连续和准确的二倍体基因组。它首先组装共同条形码读段,以生成可能包含许多缺口的准确的单倍型解析组装体,而长时间读取的程序集是连续的,但容易出错。然后将两个装配集集成到具有减少的误装配的单倍型解析的装配中。通过对多个合成数据集的广泛评估,AsmMix始终如一地在不同的测序平台上展示出高的单倍型准确率和召回率。覆盖深度,读取长度,读取准确性,显著优于该领域的其他现有工具。此外,我们使用人类全基因组数据集(HG002)验证了我们管道的有效性,并产生高度连续的,准确,和单倍型解析程序集。使用GIAB基准对这些程序集进行评估,确认变体调用的准确性。我们的结果表明,AsmMix提供了一种简单而高效的方法,可以有效地利用长读数和共条形码读数来进行单倍型解析组装。
    Accurate haplotyping facilitates distinguishing allele-specific expression, identifying cis-regulatory elements, and characterizing genomic variations, which enables more precise investigations into the relationship between genotype and phenotype. Recent advances in third-generation single-molecule long read and synthetic co-barcoded read sequencing techniques have harnessed long-range information to simplify the assembly graph and improve assembly genomic sequence. However, it remains methodologically challenging to reconstruct the complete haplotypes due to high sequencing error rates of long reads and limited capturing efficiency of co-barcoded reads. We here present a pipeline, AsmMix, for generating both contiguous and accurate diploid genomes. It first assembles co-barcoded reads to generate accurate haplotype-resolved assemblies that may contain many gaps, while the long-read assembly is contiguous but susceptible to errors. Then two assembly sets are integrated into haplotype-resolved assemblies with reduced misassembles. Through extensive evaluation on multiple synthetic datasets, AsmMix consistently demonstrates high precision and recall rates for haplotyping across diverse sequencing platforms, coverage depths, read lengths, and read accuracies, significantly outperforming other existing tools in the field. Furthermore, we validate the effectiveness of our pipeline using a human whole genome dataset (HG002), and produce highly contiguous, accurate, and haplotype-resolved assemblies. These assemblies are evaluated using the GIAB benchmarks, confirming the accuracy of variant calling. Our results demonstrate that AsmMix offers a straightforward yet highly efficient approach that effectively leverages both long reads and co-barcoded reads for haplotype-resolved assembly.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:在临床管理中建议进行HIV-1耐药性测试,并且现在许多病毒学实验室都可以使用下一代测序(NGS)方法。
    目的:评估长阅读单分子实时(SMRT)测序的诊断性能(Sequel,PacBio)用于HIV-1聚合酶基因分型。
    方法:使用Sanger测序对111个前瞻性临床样本(83个血浆和28个富含白细胞的血液部分)进行了常规HIV-1抗性基因分型分析,VelaNGS,和SMRT测序。我们开发了SMRT测序方案和生物信息学管道来推断单倍型和变体调用方法的抗逆转录病毒抗性。
    结果:在98%的血浆RNA样品中,病毒载量高于4log拷贝/mL时,通过三个平台成功测序了聚合酶。对于3至4个对数拷贝/mL的病毒载量,使用Sanger或Vela测序的成功率降至83%,使用SMRT测序的成功率降至67%。灵敏度为50%,54%和61%是使用SMRT获得的,Vela,和Sanger测序,分别,在血浆中检测不到HIV-1RNA的患者的细胞DNA中。使用SMRT测序检测到通过Sanger测序鉴定的98%的抗性相关突变(RAM)。此外,使用SMRT测序检测到用VelaNGS鉴定的91%的RAM(>5%阈值)。使用Vela和SMRT测序的RAM定量具有良好的相关性(Spearman相关性ρ=0.82;P<0.0001)。
    结论:全长HIV-1聚合酶的SMRT测序在表征RNA和DNA临床样本上的HIV-1基因型抗性方面表现良好。长读测序是突变单倍型和抗性分析的新工具。
    BACKGROUND: HIV-1 resistance testing is recommended in clinical management and next-generation sequencing (NGS) methods are now available in many virology laboratories.
    OBJECTIVE: To evaluate the diagnostic performance of Long-Read Single Molecule Real-time (SMRT) sequencing (Sequel, PacBio) for HIV-1 polymerase genotyping.
    METHODS: 111 prospective clinical samples (83 plasma and 28 leukocyte-enriched blood fraction) were analyzed for routine HIV-1 resistance genotyping using Sanger sequencing, Vela NGS, and SMRT sequencing. We developed a SMRT sequencing protocol and a bio-informatics pipeline to infer antiretroviral resistance on both haplotype and variant calling approaches.
    RESULTS: The polymerase was successfully sequenced by the three platforms in 98 % of plasma RNA samples for viral loads above 4 log copies/mL. The success rate decreased to 83 % using Sanger or Vela sequencing and to 67 % using SMRT sequencing for viral loads of 3 to 4 log copies/mL. Sensitivities of 50 %, 54 % and 61 % were obtained using SMRT, Vela, and Sanger sequencing, respectively, in cellular DNA from patients with prolonged undetectable plasma HIV-1 RNA. Ninety-eight percent of resistance-associated mutations (RAMs) identified with Sanger sequencing were detected using SMRT sequencing. Furthermore, 91 % of RAMs (> 5 % threshold) identified with Vela NGS were detected using SMRT sequencing. RAM quantification using Vela and SMRT sequencing was well correlated (Spearman correlation ρ = 0.82; P < 0.0001).
    CONCLUSIONS: SMRT sequencing of the full-length HIV-1 polymerase appeared performant for characterizing HIV-1 genotypic resistance on both RNA and DNA clinical samples. Long-read sequencing is a new tool for mutation haplotyping and resistance analysis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    纳米孔直接RNA测序(DRS)提供了直接访问天然RNA链的全长信息,揭示了基因表达谱丰富的定性和定量特性。这里有NanoTrans,我们提出了一个集成的计算框架,全面涵盖所有主要的基于DRS的应用程序范围,包括异构体聚类和量化,聚(A)尾长度估计,RNA修饰分析,和融合基因检测。除了提供这种简化的一站式解决方案外,NanoTrans还在其面向工作流程的模块化设计中大放异彩,批处理能力,多合一表格和图形报告输出,以及自动安装和配置支持。最后,通过将NanoTrans应用于酵母的真实DRS数据集,拟南芥,以及人类胚胎肾和癌细胞系,我们进一步证明了它的效用,有效性,以及在各种基于DRS的应用程序设置中的功效。
    Nanopore direct RNA sequencing (DRS) provides the direct access to native RNA strands with full-length information, shedding light on rich qualitative and quantitative properties of gene expression profiles. Here with NanoTrans, we present an integrated computational framework that comprehensively covers all major DRS-based application scopes, including isoform clustering and quantification, poly(A) tail length estimation, RNA modification profiling, and fusion gene detection. In addition to its merit in providing such a streamlined one-stop solution, NanoTrans also shines in its workflow-orientated modular design, batch processing capability, all-in-one tabular and graphic report output, as well as automatic installation and configuration supports. Finally, by applying NanoTrans to real DRS datasets of yeast, Arabidopsis, as well as human embryonic kidney and cancer cell lines, we further demonstrated its utility, effectiveness, and efficacy across a wide range of DRS-based application settings.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:尽管基因组解析宏基因组学的快速发展和宏基因组组装基因组(MAG)的显着爆炸,未培养的厌氧谱系的功能及其在碳矿化中的相互作用仍然很大程度上不确定,这对生物技术和生物地球化学有着深远的影响。
    结果:在这项研究中,我们将长读测序和超转录组学指导的代谢重建相结合,以提供厌氧生物反应器中碳矿化从聚合物到甲烷的全基因组视角。我们的结果表明,合并长读数导致宏基因组装配质量的实质性改善,实现132个高质量基因组的有效回收,这些基因组符合宏基因组组装基因组(MIMAG)最低信息的严格标准.此外,与短只读组装相比,杂交组装获得的原核基因多51%。超转录组学指导的代谢重建揭示了Mesotogasp的几种新型拟杆菌附属细菌和种群的显着代谢灵活性。清除氨基酸和糖。除了回收先前已知但分裂的共生细菌的两个环状基因组,发现Syntrophales中的两种新发现的细菌通过与主要的产甲烷菌Methanoculraceaebin的互养关系高度参与脂肪酸氧化。74和Methanothrixsp。bin.206。随着负载的增加,优选乙酸盐作为底物的bin.206的活性超过了bin.74的活性,加强底物的决定性作用。
    结论:总体而言,我们的研究发现了一些关键的活跃厌氧谱系及其在这个复杂的厌氧生态系统中的代谢功能,为理解厌氧消化中的碳转化提供了一个框架。这些发现促进了对代谢活动和营养相互作用的理解厌氧行会,提供对工程和自然生态系统中碳通量的基本见解。视频摘要。
    BACKGROUND: Despite rapid advances in genomic-resolved metagenomics and remarkable explosion of metagenome-assembled genomes (MAGs), the function of uncultivated anaerobic lineages and their interactions in carbon mineralization remain largely uncertain, which has profound implications in biotechnology and biogeochemistry.
    RESULTS: In this study, we combined long-read sequencing and metatranscriptomics-guided metabolic reconstruction to provide a genome-wide perspective of carbon mineralization flow from polymers to methane in an anaerobic bioreactor. Our results showed that incorporating long reads resulted in a substantial improvement in the quality of metagenomic assemblies, enabling the effective recovery of 132 high-quality genomes meeting stringent criteria of minimum information about a metagenome-assembled genome (MIMAG). In addition, hybrid assembly obtained 51% more prokaryotic genes in comparison to the short-read-only assembly. Metatranscriptomics-guided metabolic reconstruction unveiled the remarkable metabolic flexibility of several novel Bacteroidales-affiliated bacteria and populations from Mesotoga sp. in scavenging amino acids and sugars. In addition to recovering two circular genomes of previously known but fragmented syntrophic bacteria, two newly identified bacteria within Syntrophales were found to be highly engaged in fatty acid oxidation through syntrophic relationships with dominant methanogens Methanoregulaceae bin.74 and Methanothrix sp. bin.206. The activity of bin.206 preferring acetate as substrate exceeded that of bin.74 with increasing loading, reinforcing the substrate determinantal role.
    CONCLUSIONS: Overall, our study uncovered some key active anaerobic lineages and their metabolic functions in this complex anaerobic ecosystem, offering a framework for understanding carbon transformations in anaerobic digestion. These findings advance the understanding of metabolic activities and trophic interactions between anaerobic guilds, providing foundational insights into carbon flux within both engineered and natural ecosystems. Video Abstract.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    串联重复在整个人类基因组中频繁出现,重复长度的变化与多种性状有关。最近长读取测序技术的改进有可能极大地改善串联重复分析,尤其是长时间或复杂的重复。这里,我们介绍LongTR,从PacBio和OxfordNanoporeTechnologies提供的高保真长读数中准确地串联重复基因型。LongTR可在https://github.com/gymorek-lab/longtr和https://zenodo.org/doi/10.5281/zenodo.11403979上免费获得。
    Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    癌症发生通常涉及癌症基因组结构的重大改变,标记为难以用短读取测序捕获的大的结构和拷贝数变异(SV和CNV)。传统上,细胞遗传学技术被用来检测这种异常,但是它们的分辨率有限,不包括小于几百千碱基的功能。光学基因组作图和纳米孔测序是有吸引力的技术,可以弥合这种分辨率差距,并为细胞遗传学应用提供增强的性能。这些方法描述原生,单个DNA分子,从而捕获表观遗传信息。我们应用这两种技术来表征透明细胞肾细胞癌(ccRCC)肿瘤的结构和拷贝数景观,在变体大小和平均读取长度的上下文中突出显示每种方法的相对强度。此外,我们评估了它们在甲基化和羟甲基化方面的效用,强调表观遗传分析适用性的差异。
    Carcinogenesis often involves significant alterations in the cancer genome architecture, marked by large structural and copy number variations (SVs and CNVs) that are difficult to capture with short-read sequencing. Traditionally, cytogenetic techniques are applied to detect such aberrations, but they are limited in resolution and do not cover features smaller than several hundred kilobases. Optical genome mapping and nanopore sequencing are attractive technologies that bridge this resolution gap and offer enhanced performance for cytogenetic applications. These methods profile native, individual DNA molecules, thus capturing epigenetic information. We applied both techniques to characterize a clear cell renal cell carcinoma (ccRCC) tumor\'s structural and copy number landscape, highlighting the relative strengths of each method in the context of variant size and average read length. Additionally, we assessed their utility for methylome and hydroxymethylome profiling, emphasizing differences in epigenetic analysis applicability.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    端粒是线性染色体末端的重复DNA区域,可保护染色体末端免受降解。端粒长度在衰老和疾病的背景下得到了广泛的研究,尽管大多数研究使用平均端粒长度,但效果有限。我们提出了一种从长读取测序数据中鉴定所有92个端粒等位基因的方法。使用端粒区域附近的变异重复序列鉴定个体端粒,在等位基因中是独一无二的。端粒的这种高通量和高分辨率表征可能是未来研究特定端粒在衰老和疾病中的作用的基础。
    Telomeres are regions of repetitive DNA at the ends of linear chromosomes which protect chromosome ends from degradation. Telomere lengths have been extensively studied in the context of aging and disease, though most studies use average telomere lengths which are of limited utility. We present a method for identifying all 92 telomere alleles from long read sequencing data. Individual telomeres are identified using variant repeats proximal to telomere regions, which are unique across alleles. This high-throughput and high-resolution characterization of telomeres could be foundational to future studies investigating the roles of specific telomeres in aging and disease.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    长读测序数据,特别是来自牛津纳米孔测序平台的那些,往往表现出较高的错误率。这里,我们介绍NextDenovo,一个有效的纠错和组装工具,用于嘈杂的长时间读取,这在基因组组装中实现了高水平的准确性。我们应用NextDenovo使用Nanopore长读数据组装来自世界各地的35个不同的人类基因组。这些基因组使我们能够识别现代人群中片段复制和基因拷贝数变异的景观。NextDenovo的使用应该为使用Nanopore长读数据的群体规模长读组装铺平道路。
    Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    癌症是由许多基因组畸变引起的多方面疾病,所述基因组畸变已被鉴定为测序技术进步的结果。而下一代测序(NGS),它使用短读,改变了癌症研究和诊断,它受读取长度的限制。第三代测序(TGS),由太平洋生物科学和牛津纳米孔技术平台领导,采用长读序列,这标志着癌症研究的范式转变。癌症基因组通常包含复杂的事件,和TGS,具有跨越大型基因组区域的能力,促进了他们的表征,提供了一个更好的理解复杂的重排如何影响癌症的开始和进展。TGS还表征了各种癌症的整个转录组,揭示可作为生物标志物或治疗靶标的癌症相关亚型。此外,TGS通过改进基因组组装来推进癌症研究,检测复杂变异,并提供更完整的转录组和表观基因组。本文综述了TGS及其在癌症研究中日益增长的作用。我们研究了它的优点和局限性,提供严格的科学分析,用于检测NGS错过的先前隐藏的像差。这项有前途的技术在研究和临床应用方面都具有巨大的潜力,对癌症的诊断和治疗具有深远的意义。
    Cancer is a multifaceted disease arising from numerous genomic aberrations that have been identified as a result of advancements in sequencing technologies. While next-generation sequencing (NGS), which uses short reads, has transformed cancer research and diagnostics, it is limited by read length. Third-generation sequencing (TGS), led by the Pacific Biosciences and Oxford Nanopore Technologies platforms, employs long-read sequences, which have marked a paradigm shift in cancer research. Cancer genomes often harbour complex events, and TGS, with its ability to span large genomic regions, has facilitated their characterisation, providing a better understanding of how complex rearrangements affect cancer initiation and progression. TGS has also characterised the entire transcriptome of various cancers, revealing cancer-associated isoforms that could serve as biomarkers or therapeutic targets. Furthermore, TGS has advanced cancer research by improving genome assemblies, detecting complex variants, and providing a more complete picture of transcriptomes and epigenomes. This review focuses on TGS and its growing role in cancer research. We investigate its advantages and limitations, providing a rigorous scientific analysis of its use in detecting previously hidden aberrations missed by NGS. This promising technology holds immense potential for both research and clinical applications, with far-reaching implications for cancer diagnosis and treatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    最近出现的长读取测序技术,例如太平洋生物科学(PacBio)和牛津纳米孔技术(ONT),已经导致测序基因组的准确性和计算成本的大幅改善。然而,从头全基因组组装仍然存在与结果质量相关的重大挑战。追求从头全基因组组装仍然是一个巨大的挑战,围绕计算需求和结果质量的复杂考虑因素强调了这一点。随着测序精度和通量稳步提高,源源不断的创新装配工具充斥着这个领域。驾驭这一动态景观需要合理选择测序平台,深度,和组装工具来协调高质量的基因组重建。这篇全面的综述深入探讨了尖端的长读测序技术之间的复杂相互作用,装配方法,以及不断发展的基因组学领域。专注于应对关键挑战并利用这些进步带来的机遇,我们对影响选择最佳策略的关键因素进行了深入的探索,以实现健壮和有见地的基因组组装。
    The recent advent of long read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), have led to substantial improvements in accuracy and computational cost in sequencing genomes. However, de novo whole-genome assembly still presents significant challenges related to the quality of the results. Pursuing de novo whole-genome assembly remains a formidable challenge, underscored by intricate considerations surrounding computational demands and result quality. As sequencing accuracy and throughput steadily advance, a continuous stream of innovative assembly tools floods the field. Navigating this dynamic landscape necessitates a reasonable choice of sequencing platform, depth, and assembly tools to orchestrate high-quality genome reconstructions. This comprehensive review delves into the intricate interplay between cutting-edge long read sequencing technologies, assembly methodologies, and the ever-evolving field of genomics. With a focus on addressing the pivotal challenges and harnessing the opportunities presented by these advancements, we provide an in-depth exploration of the crucial factors influencing the selection of optimal strategies for achieving robust and insightful genome assemblies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号