Long reads

长读数
  • 文章类型: Journal Article
    准确的单倍型分析有助于区分等位基因特异性表达,识别顺式调节元素,表征基因组变异,这使得能够更精确地研究基因型和表型之间的关系。第三代单分子长读数和合成共条形码读数测序技术的最新进展已经利用远程信息来简化组装图并改善组装基因组序列。然而,由于长读数的高测序错误率和共条形码读数的有限捕获效率,重建完整单倍型在方法上仍然具有挑战性。我们在这里展示一条管道,AsmMix,用于生成连续和准确的二倍体基因组。它首先组装共同条形码读段,以生成可能包含许多缺口的准确的单倍型解析组装体,而长时间读取的程序集是连续的,但容易出错。然后将两个装配集集成到具有减少的误装配的单倍型解析的装配中。通过对多个合成数据集的广泛评估,AsmMix始终如一地在不同的测序平台上展示出高的单倍型准确率和召回率。覆盖深度,读取长度,读取准确性,显著优于该领域的其他现有工具。此外,我们使用人类全基因组数据集(HG002)验证了我们管道的有效性,并产生高度连续的,准确,和单倍型解析程序集。使用GIAB基准对这些程序集进行评估,确认变体调用的准确性。我们的结果表明,AsmMix提供了一种简单而高效的方法,可以有效地利用长读数和共条形码读数来进行单倍型解析组装。
    Accurate haplotyping facilitates distinguishing allele-specific expression, identifying cis-regulatory elements, and characterizing genomic variations, which enables more precise investigations into the relationship between genotype and phenotype. Recent advances in third-generation single-molecule long read and synthetic co-barcoded read sequencing techniques have harnessed long-range information to simplify the assembly graph and improve assembly genomic sequence. However, it remains methodologically challenging to reconstruct the complete haplotypes due to high sequencing error rates of long reads and limited capturing efficiency of co-barcoded reads. We here present a pipeline, AsmMix, for generating both contiguous and accurate diploid genomes. It first assembles co-barcoded reads to generate accurate haplotype-resolved assemblies that may contain many gaps, while the long-read assembly is contiguous but susceptible to errors. Then two assembly sets are integrated into haplotype-resolved assemblies with reduced misassembles. Through extensive evaluation on multiple synthetic datasets, AsmMix consistently demonstrates high precision and recall rates for haplotyping across diverse sequencing platforms, coverage depths, read lengths, and read accuracies, significantly outperforming other existing tools in the field. Furthermore, we validate the effectiveness of our pipeline using a human whole genome dataset (HG002), and produce highly contiguous, accurate, and haplotype-resolved assemblies. These assemblies are evaluated using the GIAB benchmarks, confirming the accuracy of variant calling. Our results demonstrate that AsmMix offers a straightforward yet highly efficient approach that effectively leverages both long reads and co-barcoded reads for haplotype-resolved assembly.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    纳米孔直接RNA测序(DRS)提供了直接访问天然RNA链的全长信息,揭示了基因表达谱丰富的定性和定量特性。这里有NanoTrans,我们提出了一个集成的计算框架,全面涵盖所有主要的基于DRS的应用程序范围,包括异构体聚类和量化,聚(A)尾长度估计,RNA修饰分析,和融合基因检测。除了提供这种简化的一站式解决方案外,NanoTrans还在其面向工作流程的模块化设计中大放异彩,批处理能力,多合一表格和图形报告输出,以及自动安装和配置支持。最后,通过将NanoTrans应用于酵母的真实DRS数据集,拟南芥,以及人类胚胎肾和癌细胞系,我们进一步证明了它的效用,有效性,以及在各种基于DRS的应用程序设置中的功效。
    Nanopore direct RNA sequencing (DRS) provides the direct access to native RNA strands with full-length information, shedding light on rich qualitative and quantitative properties of gene expression profiles. Here with NanoTrans, we present an integrated computational framework that comprehensively covers all major DRS-based application scopes, including isoform clustering and quantification, poly(A) tail length estimation, RNA modification profiling, and fusion gene detection. In addition to its merit in providing such a streamlined one-stop solution, NanoTrans also shines in its workflow-orientated modular design, batch processing capability, all-in-one tabular and graphic report output, as well as automatic installation and configuration supports. Finally, by applying NanoTrans to real DRS datasets of yeast, Arabidopsis, as well as human embryonic kidney and cancer cell lines, we further demonstrated its utility, effectiveness, and efficacy across a wide range of DRS-based application settings.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:尽管基因组解析宏基因组学的快速发展和宏基因组组装基因组(MAG)的显着爆炸,未培养的厌氧谱系的功能及其在碳矿化中的相互作用仍然很大程度上不确定,这对生物技术和生物地球化学有着深远的影响。
    结果:在这项研究中,我们将长读测序和超转录组学指导的代谢重建相结合,以提供厌氧生物反应器中碳矿化从聚合物到甲烷的全基因组视角。我们的结果表明,合并长读数导致宏基因组装配质量的实质性改善,实现132个高质量基因组的有效回收,这些基因组符合宏基因组组装基因组(MIMAG)最低信息的严格标准.此外,与短只读组装相比,杂交组装获得的原核基因多51%。超转录组学指导的代谢重建揭示了Mesotogasp的几种新型拟杆菌附属细菌和种群的显着代谢灵活性。清除氨基酸和糖。除了回收先前已知但分裂的共生细菌的两个环状基因组,发现Syntrophales中的两种新发现的细菌通过与主要的产甲烷菌Methanoculraceaebin的互养关系高度参与脂肪酸氧化。74和Methanothrixsp。bin.206。随着负载的增加,优选乙酸盐作为底物的bin.206的活性超过了bin.74的活性,加强底物的决定性作用。
    结论:总体而言,我们的研究发现了一些关键的活跃厌氧谱系及其在这个复杂的厌氧生态系统中的代谢功能,为理解厌氧消化中的碳转化提供了一个框架。这些发现促进了对代谢活动和营养相互作用的理解厌氧行会,提供对工程和自然生态系统中碳通量的基本见解。视频摘要。
    BACKGROUND: Despite rapid advances in genomic-resolved metagenomics and remarkable explosion of metagenome-assembled genomes (MAGs), the function of uncultivated anaerobic lineages and their interactions in carbon mineralization remain largely uncertain, which has profound implications in biotechnology and biogeochemistry.
    RESULTS: In this study, we combined long-read sequencing and metatranscriptomics-guided metabolic reconstruction to provide a genome-wide perspective of carbon mineralization flow from polymers to methane in an anaerobic bioreactor. Our results showed that incorporating long reads resulted in a substantial improvement in the quality of metagenomic assemblies, enabling the effective recovery of 132 high-quality genomes meeting stringent criteria of minimum information about a metagenome-assembled genome (MIMAG). In addition, hybrid assembly obtained 51% more prokaryotic genes in comparison to the short-read-only assembly. Metatranscriptomics-guided metabolic reconstruction unveiled the remarkable metabolic flexibility of several novel Bacteroidales-affiliated bacteria and populations from Mesotoga sp. in scavenging amino acids and sugars. In addition to recovering two circular genomes of previously known but fragmented syntrophic bacteria, two newly identified bacteria within Syntrophales were found to be highly engaged in fatty acid oxidation through syntrophic relationships with dominant methanogens Methanoregulaceae bin.74 and Methanothrix sp. bin.206. The activity of bin.206 preferring acetate as substrate exceeded that of bin.74 with increasing loading, reinforcing the substrate determinantal role.
    CONCLUSIONS: Overall, our study uncovered some key active anaerobic lineages and their metabolic functions in this complex anaerobic ecosystem, offering a framework for understanding carbon transformations in anaerobic digestion. These findings advance the understanding of metabolic activities and trophic interactions between anaerobic guilds, providing foundational insights into carbon flux within both engineered and natural ecosystems. Video Abstract.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    长读测序数据,特别是来自牛津纳米孔测序平台的那些,往往表现出较高的错误率。这里,我们介绍NextDenovo,一个有效的纠错和组装工具,用于嘈杂的长时间读取,这在基因组组装中实现了高水平的准确性。我们应用NextDenovo使用Nanopore长读数据组装来自世界各地的35个不同的人类基因组。这些基因组使我们能够识别现代人群中片段复制和基因拷贝数变异的景观。NextDenovo的使用应该为使用Nanopore长读数据的群体规模长读组装铺平道路。
    Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    从头组装在宏基因组分析中起着关键作用,第三代测序技术的掺入可以显著提高组装结果的完整性和准确性。最近,随着测序技术的进步(Hi-Fi,超长),已经开发了几种基于长期阅读的生物信息学工具。然而,这些工具的性能和可靠性的验证是一个至关重要的问题。为了解决这个差距,我们介绍了MCSS(基于结构的微生物群落模拟器),它能够根据真实微生物群落的结构属性生成模拟微生物群落和测序数据集。评估结果表明,它可以生成与实际社区结构同时表现出多样性和相似性的模拟社区。此外,MCSS为模拟群落中的物种生成合成的PacBioHi-Fi和牛津纳米孔技术(ONT)长读数。这种创新工具为基准测试和完善宏基因组分析方法提供了宝贵的资源。代码可在以下网址获得:https://github.com/panlab-bio/mcss。
    De novo assembly plays a pivotal role in metagenomic analysis, and the incorporation of third-generation sequencing technology can significantly improve the integrity and accuracy of assembly results. Recently, with advancements in sequencing technology (Hi-Fi, ultra-long), several long-read-based bioinformatic tools have been developed. However, the validation of the performance and reliability of these tools is a crucial concern. To address this gap, we present MCSS (microbial community simulator based on structure), which has the capability to generate simulated microbial community and sequencing datasets based on the structure attributes of real microbiome communities. The evaluation results indicate that it can generate simulated communities that exhibit both diversity and similarity to actual community structures. Additionally, MCSS generates synthetic PacBio Hi-Fi and Oxford Nanopore Technologies (ONT) long reads for the species within the simulated community. This innovative tool provides a valuable resource for benchmarking and refining metagenomic analysis methods. Code available at: https://github.com/panlab-bio/mcss.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    西双版纳(XIS)黄瓜(Cucumissativusvar。xishuangbannanesis)是具有许多独特农艺性状的半野生品种。这里,Nanopore测序技术产生的长读数有助于组装LandraceXIS49的高质量基因组(重叠群N50=8.7Mb)。与中国龙(CL)进行比较时,总共鉴定出10,036种结构/序列变异(SV),和已知的控制脊椎的SV,结节,和心皮编号在XIS49基因组中得到证实。弱光下胚轴伸长的两个QTL,SH3.1和SH6.1使用渗入系进行精细定位(供体亲本,XIS49;轮回生父母,CL)。SH3.1编码红光受体植物色素B(PhyB,CsaV3_3G015190)。在XIS49中的PhyB基因的启动子中鉴定出一个〜4kb的大缺失(DEL)和高度发散的区域(HDRs)。这种PhyB功能的丧失导致超长下胚轴表型。SH6.1编码CCCH型锌指蛋白FRIGIDA-基本样(FEL,CsaV3_6G050300)。FEL负调控下胚轴伸长,但在CL黄瓜中被长末端重复序列(LTR)逆转录转座子插入转录抑制。机械上,FEL与本构光致形态1a(COP1a)的启动子物理结合,调节COP1a的表达和下游下胚轴伸长。以上结果说明了弱光下黄瓜下胚轴伸长的遗传机制。
    The Xishuangbanna (XIS) cucumber (Cucumis sativus var. xishuangbannanesis) is a semiwild variety that has many distinct agronomic traits. Here, long reads generated by Nanopore sequencing technology helped assembling a high-quality genome (contig N50 = 8.7 Mb) of landrace XIS49. A total of 10,036 structural/sequence variations (SVs) were identified when comparing with Chinese Long (CL), and known SVs controlling spines, tubercles, and carpel number were confirmed in XIS49 genome. Two QTLs of hypocotyl elongation under low light, SH3.1 and SH6.1, were fine-mapped using introgression lines (donor parent, XIS49; recurrent parent, CL). SH3.1 encodes a red-light receptor Phytochrome B (PhyB, CsaV3_3G015190). A ∼4 kb region with large deletion and highly divergent regions (HDRs) were identified in the promoter of the PhyB gene in XIS49. Loss of function of this PhyB caused a super-long hypocotyl phenotype. SH6.1 encodes a CCCH-type zinc finger protein FRIGIDA-ESSENTIAL LIKE (FEL, CsaV3_6G050300). FEL negatively regulated hypocotyl elongation but it was transcriptionally suppressed by long terminal repeats retrotransposon insertion in CL cucumber. Mechanistically, FEL physically binds to the promoter of CONSTITUTIVE PHOTOMORPHOGENIC 1a (COP1a), regulating the expression of COP1a and the downstream hypocotyl elongation. These above results demonstrate the genetic mechanism of cucumber hypocotyl elongation under low light.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    作为一种电泳寄生虫和病毒载体,螨虫Varroa破坏因子和相关的变形机翼病毒(DWV)形成了对蜜蜂的致命组合,蜜蜂。据报道,常规杀螨剂治疗可减少螨虫的多样性,并选择对这些治疗的耐受性。Further,当通过螨虫传播时,不同的DWV应变面临选择性压力。在这项研究中,使用长读段对瓦螨和相关DWV变异体的单倍型进行了定量.单个单倍型主导螨线粒体基因细胞色素氧化酶亚基I,反映了一个古老的瓶颈。然而,高度多态性的基因存在于整个螨基因组中,表明可以在区域一级积极维持螨虫的多样性。在螨虫和蜜蜂中检测到的DWV均显示出显性变异,只有少数低频备用单倍型。从蜜蜂和螨虫中分离出的DWV单倍型的相对丰度高度一致,这表明一些变体受到持续选择的青睐。
    As a phoretic parasite and virus vector, the mite Varroa destructor and the associated Deformed wing virus (DWV) form a lethal combination to the honey bee, Apis mellifera. Routine acaricide treatment has been reported to reduce the diversity of mites and select for tolerance against these treatments. Further, different DWV strains face selective pressures when transmitted via mites. In this study, the haplotypes of Varroa mites and associated DWV variants were quantified using long reads. A single haplotype dominated the mite mitochondrial gene cytochrome oxidase subunit I, reflecting an ancient bottleneck. However, highly polymorphic genes were present across the mite genome, suggesting the diversity of mites could be actively maintained at a regional level. DWV detected in both mites and honey bees show a dominant variant with only a few low-frequency alternate haplotypes. The relative abundances of DWV haplotypes isolated from honey bees and mites were highly consistent, suggesting that some variants are favored by ongoing selection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    生物体完整基因组的可用性在整个生物实体的综合分析中起着至关重要的作用。尽管测序技术的快速发展,基因组固有的复杂性不可避免地导致基因组组装过程中的缺口。为了避免这种情况,已经出现了许多利用长读数的基因组缺口填充工具。然而,目前缺乏对这些工具的全面评估。在这项研究中,我们在不同的倍性水平和不同的数据生成方法下评估了七个软件,并使用QUAST和两个额外的标准(如准确性和完整性)进行评估。我们的发现表明,不同工具的性能在不同倍性水平上有所不同。基于准确性和完整性,FGAP成为性能最好的工具,在单倍体和四倍体方案中均表现优异。这种对常用基因组缺口填补工具的评估旨在为用户提供有价值的工具选择见解,帮助他们选择最适合其特定需求的基因组缺口填补工具。
    The availability of the complete genome of an organism plays a crucial role in the comprehensive analysis of the entire biological entity. Despite the rapid advancements in sequencing technologies, the inherent complexities of genomes inevitably lead to gaps during genome assembly. To obviate this, numerous genome gap-filling tools utilizing long reads have emerged. However, a comprehensive evaluation of these tools is currently lacking. In this study, we evaluated seven software under various ploidy levels and different data generation methods, and assessing them using QUAST and two additional criteria such as accuracy and completeness. Our findings revealed that the performance of the different tools varied across diverse ploidy levels. Based on accuracy and completeness, FGAP emerged as the top-performing tool, excelling in both haploid and tetraploid scenarios. This evaluation of commonly used genome gap-filling tools aims to provide users with valuable insights for tool selection, assisting them in choosing the most suitable genome gap-filling tool for their specific needs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Published Erratum
    [这更正了文章DOI:10.3389/fgene.202.816825。].
    [This corrects the article DOI: 10.3389/fgene.2022.816825.].
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:基因组结构变异检测是基因组分析中一个重要且具有挑战性的问题。现有的基于长读数的结构变体检测方法在检测多类型结构变体方面仍有改进空间。
    结果:在本文中,我们提出了一种称为cnnLSV的方法,通过消除从现有方法的调用集中合并的检测结果中的误报来获得更高质量的检测结果。我们针对四种类型的结构变体设计了一种编码策略,将结构变体周围的长读比对信息表示为图像,将图像输入到构建的卷积神经网络中,以训练滤波器模型,并加载训练好的模型,去除误报,提高检测性能。我们还通过使用主成分分析算法和无监督聚类算法k-means来消除训练模型阶段的错误标记训练样本。在模拟和真实数据集上的实验结果表明,我们提出的方法在检测插入方面优于现有方法。删除,倒置,和重复。cnnLSV的程序可在https://github.com/mhuidong/cnnLSV获得。
    结论:提出的cnnLSV可以通过使用长读取的对齐信息和卷积神经网络来检测结构变体,以实现整体更高的性能,并在训练模型阶段利用主成分分析和k-means算法有效剔除标记错误的样本。
    BACKGROUND: Genomic structural variant detection is a significant and challenging issue in genome analysis. The existing long-read based structural variant detection methods still have space for improvement in detecting multi-type structural variants.
    RESULTS: In this paper, we propose a method called cnnLSV to obtain detection results with higher quality by eliminating false positives in the detection results merged from the callsets of existing methods. We design an encoding strategy for four types of structural variants to represent long-read alignment information around structural variants into images, input the images into a constructed convolutional neural network to train a filter model, and load the trained model to remove the false positives to improve the detection performance. We also eliminate mislabeled training samples in the training model phase by using principal component analysis algorithm and unsupervised clustering algorithm k-means. Experimental results on both simulated and real datasets show that our proposed method outperforms existing methods overall in detecting insertions, deletions, inversions, and duplications. The program of cnnLSV is available at https://github.com/mhuidong/cnnLSV .
    CONCLUSIONS: The proposed cnnLSV can detect structural variants by using long-read alignment information and convolutional neural network to achieve overall higher performance, and effectively eliminate incorrectly labeled samples by using the principal component analysis and k-means algorithms in training model stage.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号