Long read sequencing

长读取测序
  • 文章类型: Journal Article
    本手稿报道了耳念珠菌菌株B11103,B11221和B11244的完整基因组组装。这些菌株代表了三个地理分支,即,南亚(分化I),南非(CladeIII),南美(CladeIV)。
    The complete genome assembly of Candida auris strains B11103, B11221, and B11244 is reported in this manuscript. These strains represent the three geographical clades, namely, South Asian (Clade I), South African (Clade III), and South American (Clade IV).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:牛津纳米孔提供了高通量测序平台,能够以99.95%的准确性重建完整的细菌基因组。然而,即使是很小的错误也会掩盖密切相关的分离株之间的系统发育关系。已开发出抛光工具来纠正这些错误,但尚不确定他们是否获得高分辨率食源性疾病暴发源跟踪所需的准确性。
    结果:我们测试了132种组装和短期和长期阅读抛光工具的组合,以评估其重建来自2020年洋葱爆发的15种高度相似的肠道沙门氏菌血清型Newport分离株基因组序列的准确性。虽然长读抛光单独提高了精度,接近完美的准确性(99.9999%的准确性或在4.8Mbp基因组中存在约5个核苷酸错误,排除低置信度区域)仅通过组合了长读数和短读数抛光工具的管道获得。值得注意的是,medaka是比Racon更准确,更有效的长读抛光机。在短读抛光机中,NextPolish显示了最高的准确性,但是Pilon,Polypolish,和POLCA的表现类似。在5个表现最好的管道中,用medaka抛光,其次是nextpolish是最常见的组合。重要的是,抛光工具的顺序很重要,即在更准确的工具引入错误之后,使用不太准确的工具。均聚物和重复区域的Indels,其中短读取不能唯一映射,仍然是最难纠正的错误。
    结论:仍然需要短读数来纠正纳米孔测序组装中的错误,以获得源跟踪调查所需的准确性。我们对抛光管道性能的细粒度评估使我们能够为工具用户提供最佳实践,并为工具开发人员提供改进的领域。
    BACKGROUND: Oxford Nanopore provides high throughput sequencing platforms able to reconstruct complete bacterial genomes with 99.95% accuracy. However, even small levels of error can obscure the phylogenetic relationships between closely related isolates. Polishing tools have been developed to correct these errors, but it is uncertain if they obtain the accuracy needed for the high-resolution source tracking of foodborne illness outbreaks.
    RESULTS: We tested 132 combinations of assembly and short- and long-read polishing tools to assess their accuracy for reconstructing the genome sequences of 15 highly similar Salmonella enterica serovar Newport isolates from a 2020 onion outbreak. While long-read polishing alone improved accuracy, near perfect accuracy (99.9999% accuracy or ~ 5 nucleotide errors across the 4.8 Mbp genome, excluding low confidence regions) was only obtained by pipelines that combined both long- and short-read polishing tools. Notably, medaka was a more accurate and efficient long-read polisher than Racon. Among short-read polishers, NextPolish showed the highest accuracy, but Pilon, Polypolish, and POLCA performed similarly. Among the 5 best performing pipelines, polishing with medaka followed by NextPolish was the most common combination. Importantly, the order of polishing tools mattered i.e., using less accurate tools after more accurate ones introduced errors. Indels in homopolymers and repetitive regions, where the short reads could not be uniquely mapped, remained the most challenging errors to correct.
    CONCLUSIONS: Short reads are still needed to correct errors in nanopore sequenced assemblies to obtain the accuracy required for source tracking investigations. Our granular assessment of the performance of the polishing pipelines allowed us to suggest best practices for tool users and areas for improvement for tool developers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:大量具有挑战性的医学相关基因(CMRG)位于人类基因组的复杂或高度重复区域,使用下一代测序技术阻碍了遗传变异的全面表征。在这项研究中,我们采用了长读数测序技术,广泛用于研究复杂的基因组区域,为了表征遗传改变,包括短变体(单核苷酸变体和短插入和缺失)和拷贝数变异,在来自19个全球人口的41个人的370个CMRG中。
    结果:我们的分析显示CMRGs中存在高水平的遗传变异,68.73%表现出拷贝数变异,65.20%含有可能破坏个体蛋白质功能的短变体。这些变异可以影响药物基因组学,遗传性疾病易感性,和其他临床结果。我们观察到不同种群的CMRG变异存在显著差异,与其他大陆的样本相比,非洲血统的个体拥有最高数量的拷贝数变体和短变体。值得注意的是,15.79%至33.96%的短变体通过长读取测序是唯一可检测的。虽然T2T-CHM13参考基因组显着改善了CMRG区域的组装,从而促进这些区域的变异检测,一些地区仍然缺乏决心。
    结论:我们的结果为未来的临床和药物遗传学研究提供了重要的参考,强调需要在参考基因组中全面代表全球遗传多样性,并改进变体调用技术以完全解析医学相关基因。
    BACKGROUND: A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations.
    RESULTS: Our analysis revealed high levels of genetic variants in CMRGs, with 68.73% exhibiting copy number variations and 65.20% containing short variants that may disrupt protein function across individuals. Such variants can influence pharmacogenomics, genetic disease susceptibility, and other clinical outcomes. We observed significant differences in CMRG variation across populations, with individuals of African ancestry harboring the highest number of copy number variants and short variants compared to samples from other continents. Notably, 15.79% to 33.96% of short variants were exclusively detectable through long-read sequencing. While the T2T-CHM13 reference genome significantly improved the assembly of CMRG regions, thereby facilitating variant detection in these regions, some regions still lacked resolution.
    CONCLUSIONS: Our results provide an important reference for future clinical and pharmacogenetic studies, highlighting the need for a comprehensive representation of global genetic diversity in the reference genome and improved variant calling techniques to fully resolve medically relevant genes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    抗菌素耐药性仍然是一个重大的全球和一个健康威胁,由于抗生素对快速进化的多重耐药细菌的有效性下降,以及对开发新抗生素疗法的有限创新研究。在这篇文章中,我们提供了从高度精确的长读PacBio®HiFi技术获得的变形杆菌mirabilis-MN029的全基因组序列数据。还使用圆盘扩散法评估了选定的非洲本地植物物种的抗菌活性。从基因组数据中鉴定了获得的抗生素抗性基因和与具有临床重要性的抗生素相对应的染色体突变。以乙酸乙酯为溶剂,紫藤叶提取物对奇异变形杆菌MN029显示出最有希望的抗菌作用。这些数据集将用于未来的实验研究,旨在从植物提取物中设计新的抗菌药物,这些抗菌药物单独或与现有抗生素联合使用以克服多药耐药机制。
    Antimicrobial resistance remains a significant global and One Health threat, owing to the diminishing effectiveness of antibiotics against rapidly evolving multidrug-resistant bacteria, and the limited innovative research towards the development of new antibiotic therapeutics. In this article, we present the whole-genome sequence data of Proteus mirabilis-MN029 obtained from highly accurate long-read PacBioⓇ HiFi technology. The antibacterial activities of the selected African native plant species were also evaluated using the disk diffusion method. Acquired antibiotic resistance genes and chromosomal mutations corresponding to antibiotics of clinical importance were identified from genomic data. Using ethlyl acetate as solvent, Pterocarpus angolensis leaf extracts showed the most promising antibacterial effects against Proteus mirabilis-MN029. These datasets will be useful for future experimental research aimed at designing new antibacterial drugs from plant extracts that are effective alone or in combination with existing antibiotics to overcome multidrug-resistance mechanisms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    过度生长障碍包括一组具有可变表型谱的实体,其范围从高大到身体部位和/或器官的分离或侧向过度生长。取决于受致病性遗传改变影响的潜在生理途径,过度生长综合征与广泛的瘤形成倾向有关,(心脏)血管和神经发育异常,和畸形。病理性过度生长可能是产前或产后发作。它要么是由于细胞数量增加(内在细胞增生),正常数量的细胞肥大,间隙空间的增加,或所有这些的组合。潜在的分子原因包括越来越多的影响骨骼生长的遗传改变和生长相关的信号级联作为主要效应子,它们可以影响整个身体或部分(马赛克)。此外,表观遗传修饰在某些过度生长疾病的表现中起关键作用。作为个性化临床管理的先决条件,过度生长综合征的诊断可能具有挑战性。由于其临床和分子异质性。医生应将分子基因检测作为过度生长综合征的第一个诊断步骤。特别是,必须考虑对肿瘤易感性综合征进行精确诊断的迫切需要,作为早期监测和治疗的基础。随着(未来)下一代测序方法和进一步的组学技术的实施,临床诊断不仅可以得到证实,但他们也证实了过度生长障碍的临床和分子谱,包括意外发现和非典型病例的鉴定。然而,必须考虑所应用的检测方法的局限性,对于每种感兴趣的疾病,可能的基因组变异类型的范围必须考虑,因为它们可能需要不同的方法学策略.此外,人工智能(AI)在诊断工作流程中的整合显著有助于表型驱动的分子和生理数据的选择和解释。
    Overgrowth disorders comprise a group of entities with a variable phenotypic spectrum ranging from tall stature to isolated or lateralized overgrowth of body parts and or organs. Depending on the underlying physiological pathway affected by pathogenic genetic alterations, overgrowth syndromes are associated with a broad spectrum of neoplasia predisposition, (cardio) vascular and neurodevelopmental anomalies, and dysmorphisms. Pathologic overgrowth may be of prenatal or postnatal onset. It either results from an increased number of cells (intrinsic cellular hyperplasia), hypertrophy of the normal number of cells, an increase in interstitial spaces, or from a combination of all of these. The underlying molecular causes comprise a growing number of genetic alterations affecting skeletal growth and Growth-relevant signaling cascades as major effectors, and they can affect the whole body or parts of it (mosaicism). Furthermore, epigenetic modifications play a critical role in the manifestation of some overgrowth diseases. The diagnosis of overgrowth syndromes as the prerequisite of a personalized clinical management can be challenging, due to their clinical and molecular heterogeneity. Physicians should consider molecular genetic testing as a first diagnostic step in overgrowth syndromes. In particular, the urgent need for a precise diagnosis in tumor predisposition syndromes has to be taken into account as the basis for an early monitoring and therapy. With the (future) implementation of next-generation sequencing approaches and further omic technologies, clinical diagnoses can not only be verified, but they also confirm the clinical and molecular spectrum of overgrowth disorders, including unexpected findings and identification of atypical cases. However, the limitations of the applied assays have to be considered, for each of the disorders of interest, the spectrum of possible types of genomic variants has to be considered as they might require different methodological strategies. Additionally, the integration of artificial intelligence (AI) in diagnostic workflows significantly contribute to the phenotype-driven selection and interpretation of molecular and physiological data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    16SrRNAPCR扩增子的序列比较是分类学鉴定细菌分离物和分析复杂微生物群落的既定方法。长读测序技术的最新进展的一个潜在应用是对整个rRNA操纵子进行测序,与单独测序16SrRNA(或其区域)相比,捕获更多的系统发育信息。有可能增加可以可靠地分类到较低分类等级的扩增子的比例。在这里,我们描述了GROND(基因组衍生的核糖体操纵子数据库),公开可用的经过质量检查的16S-ITS-23SrRNA操纵子数据库,伴随着多种分类学分类。GROND将帮助研究人员分析他们的数据,并充当标准化数据库,以比较研究之间的结果。
    Sequence comparison of 16S rRNA PCR amplicons is an established approach to taxonomically identify bacterial isolates and profile complex microbial communities. One potential application of recent advances in long-read sequencing technologies is to sequence entire rRNA operons and capture significantly more phylogenetic information compared to sequencing of the 16S rRNA (or regions thereof) alone, with the potential to increase the proportion of amplicons that can be reliably classified to lower taxonomic ranks. Here we describe GROND (Genome-derived Ribosomal Operon Database), a publicly available database of quality-checked 16S-ITS-23S rRNA operons, accompanied by multiple taxonomic classifications. GROND will aid researchers in analysis of their data and act as a standardised database to allow comparison of results between studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Preprint
    长读转录组测序(long-RNA-seq)技术的进步彻底改变了同种型多样性的研究。这些全长转录本增强了对各种转录组结构变异的检测,包括新的同工型,选择性剪接事件,和融合转录本。通过移动开放阅读框或改变基因表达,研究证明,这些转录改变可以作为疾病诊断和治疗靶点的关键生物标志物。在这个项目中,我们提出了IFDlong,一种生物信息学和生物统计学工具,用于使用批量或单细胞长RNA-seq数据检测同工型和融合转录本。具体来说,软件对每个长读数进行基因和同工型注释,定义的新型同工型,通过一种新的期望最大化算法量化同工型表达,并对融合转录本进行了分析。为了评估,与大规模仿真研究中的几种现有工具相比,IFDlong管道总体上实现了最佳性能。在同工型和融合转录物定量中,IFDlong能够达到超过0.8斯皮尔曼与真相的相关性,区分多个可变剪接事件时,余弦相似性大于0.9。在新的同工型模拟中,IFDlong可以成功地平衡灵敏度(高于90%)和特异性(高于90%)。此外,IFDlong已在健康组织的各种内部和公共数据集中证明了其准确性和鲁棒性,细胞系和多种类型的疾病。除了大量长RNA-seq,IFDlong管道已证明其与单细胞长RNA-seq数据的兼容性。这种新软件可能有望对长读转录组分析产生重大影响。IFDlong软件可在https://github.com/wenjiaking/IFDlong获得。
    Advancements in long-read transcriptome sequencing (long-RNA-seq) technology have revolutionized the study of isoform diversity. These full-length transcripts enhance the detection of various transcriptome structural variations, including novel isoforms, alternative splicing events, and fusion transcripts. By shifting the open reading frame or altering gene expressions, studies have proved that these transcript alterations can serve as crucial biomarkers for disease diagnosis and therapeutic targets. In this project, we proposed IFDlong, a bioinformatics and biostatistics tool to detect isoform and fusion transcripts using bulk or single-cell long-RNA-seq data. Specifically, the software performed gene and isoform annotation for each long-read, defined novel isoforms, quantified isoform expression by a novel expectation-maximization algorithm, and profiled the fusion transcripts. For evaluation, IFDlong pipeline achieved overall the best performance when compared with several existing tools in large-scale simulation studies. In both isoform and fusion transcript quantification, IFDlong is able to reach more than 0.8 Spearman\'s correlation with the truth, and more than 0.9 cosine similarity when distinguishing multiple alternative splicing events. In novel isoform simulation, IFDlong can successfully balance the sensitivity (higher than 90%) and specificity (higher than 90%). Furthermore, IFDlong has proved its accuracy and robustness in diverse in-house and public datasets on healthy tissues, cell lines and multiple types of diseases. Besides bulk long-RNA-seq, IFDlong pipeline has proved its compatibility to single-cell long-RNA-seq data. This new software may hold promise for significant impact on long-read transcriptome analysis. The IFDlong software is available at https://github.com/wenjiaking/IFDlong.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    描述肿瘤内的微生物群落一直是理解肿瘤微环境的病理生理学的关键方面。在头颈癌(HNC)中,大多数对组织样本的研究仅对V3-V5区域进行了16SrRNA短读序列测序(SRS).SRS主要限于属水平识别。在这项研究中,我们在26个HNC肿瘤组织中比较了来自牛津纳米孔技术(ONT)的全长16SrRNA长读测序(FL-ONT)与V3-V4IlluminaSRS(V3V4-Illumina)。还使用基于培养的方法在使用MALDI-TOFMS从4名患者获得的16种细菌分离物中进行了进一步验证。我们观察到FL-ONT和V3V4-Illumina之间相似的α多样性指数。然而,β-多样性在技术之间显著不同(PERMANOVA-R2=0.131,p<0.0001)。在较高的分类水平(从Phylum到Family),所有指标在测序技术中更相似,而较低的分类法显示出更多的差异。在较高的分类水平,FL-ONT和V3V4-Illumina的相对丰度相关性较高,而这种相关性在较低水平下降。最后,FL-ONT能够在物种水平上鉴定更多使用MALDI-TOFMS鉴定的分离株(75%vs.18.8%)。与V3V4-Illumina16SrRNA测序相比,FL-ONT能够以更好的分辨率鉴定较低的分类水平。
    Describing the microbial community within the tumour has been a key aspect in understanding the pathophysiology of the tumour microenvironment. In head and neck cancer (HNC), most studies on tissue samples have only performed 16S rRNA short-read sequencing (SRS) on V3-V5 region. SRS is mostly limited to genus level identification. In this study, we compared full-length 16S rRNA long-read sequencing (FL-ONT) from Oxford Nanopore Technology (ONT) to V3-V4 Illumina SRS (V3V4-Illumina) in 26 HNC tumour tissues. Further validation was also performed using culture-based methods in 16 bacterial isolates obtained from 4 patients using MALDI-TOF MS. We observed similar alpha diversity indexes between FL-ONT and V3V4-Illumina. However, beta-diversity was significantly different between techniques (PERMANOVA - R2 = 0.131, p < 0.0001). At higher taxonomic levels (Phylum to Family), all metrics were more similar among sequencing techniques, while lower taxonomy displayed more discrepancies. At higher taxonomic levels, correlation in relative abundance from FL-ONT and V3V4-Illumina were higher, while this correlation decreased at lower levels. Finally, FL-ONT was able to identify more isolates at the species level that were identified using MALDI-TOF MS (75% vs. 18.8%). FL-ONT was able to identify lower taxonomic levels at a better resolution as compared to V3V4-Illumina 16S rRNA sequencing.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Preprint
    长读基因组测序(lrGS)的变异检测已被证明比短读基因组测序(srGS)的变异检测更为准确和全面。然而,lrGS可以提高罕见疾病分子诊断产量的速率尚未得到精确表征.我们使用PacificBiosciences\“HiFi\”技术对96名疑似遗传性罕见疾病的短阅读阴性先证者进行了lrGS。我们产生了hg38比对的变体和从头分阶段的基因组组装,并随后注释,过滤,并使用临床标准策划变异。在16/96(16.7%)先证者中发现了新的疾病相关或潜在相关的遗传发现,其中8个(8/96,8.33%)具有致病性或可能的致病性变异。在9个先证者的srGS和lrGS中均可见新发现的变体(〜9.4%),并且主要是由于最近的基因-疾病关联发现引起的解释变化。七个病例包括只能在lrGS中解释的变体,包括拷贝数变体,倒置,移动元件插入,两个低复杂度重复扩展,和一个1bp的缺失。虽然这些变体的证据都是,回想起来,在srGS中可见,它们要么是:不在srGS数据中调用,由大小或结构不正确的调用表示,或质量控制和过滤失败。因此,虽然对旧数据的重新分析明显增加了诊断结果,我们发现lrGS允许超过srGS的大量额外产量(7/96,7.3%)。我们预计随着lrGS分析的改进,随着lrGS数据集的增长,允许更好的变异频率注释,额外的仅lrGS的罕见疾病产量将随着时间的推移而增长。
    Variant detection from long-read genome sequencing (lrGS) has proven to be considerably more accurate and comprehensive than variant detection from short-read genome sequencing (srGS). However, the rate at which lrGS can increase molecular diagnostic yield for rare disease is not yet precisely characterized. We performed lrGS using Pacific Biosciences \"HiFi\" technology on 96 short-read-negative probands with rare disease that were suspected to be genetic. We generated hg38-aligned variants and de novo phased genome assemblies, and subsequently annotated, filtered, and curated variants using clinical standards. New disease-relevant or potentially relevant genetic findings were identified in 16/96 (16.7%) probands, eight of which (8/96, 8.33%) harbored pathogenic or likely pathogenic variants. Newly identified variants were visible in both srGS and lrGS in nine probands (~9.4%) and resulted from changes to interpretation mostly from recent gene-disease association discoveries. Seven cases included variants that were only interpretable in lrGS, including copy-number variants, an inversion, a mobile element insertion, two low-complexity repeat expansions, and a 1 bp deletion. While evidence for each of these variants is, in retrospect, visible in srGS, they were either: not called within srGS data, were represented by calls with incorrect sizes or structures, or failed quality-control and filtration. Thus, while reanalysis of older data clearly increases diagnostic yield, we find that lrGS allows for substantial additional yield (7/96, 7.3%) beyond srGS. We anticipate that as lrGS analysis improves, and as lrGS datasets grow allowing for better variant frequency annotation, the additional lrGS-only rare disease yield will grow over time.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在没有DNA模板的情况下,预定义序列的长双链DNA分子的从头算产生尤其具有挑战性。DNA合成步骤仍然是许多应用的瓶颈,例如祖先基因功能评估,分析选择性剪接或基于DNA的数据存储。在本报告中,我们提出了一种完全体外的方案,以使用GoldenGate组装在不到3天的时间内从市售的短DNA块开始产生非常长的双链DNA分子。这种创新的应用使我们能够简化生产24kb长的DNA分子的过程,该分子存储了1789年《人权宣言》和《公民权利宣言》的一部分。产生的DNA分子可以容易地克隆到合适的宿主/载体系统中用于扩增和选择。
    In the absence of a DNA template, the ab initio production of long double-stranded DNA molecules of predefined sequences is particularly challenging. The DNA synthesis step remains a bottleneck for many applications such as functional assessment of ancestral genes, analysis of alternative splicing or DNA-based data storage. In this report we propose a fully in vitro protocol to generate very long double-stranded DNA molecules starting from commercially available short DNA blocks in less than 3 days using Golden Gate assembly. This innovative application allowed us to streamline the process to produce a 24 kb-long DNA molecule storing part of the Declaration of the Rights of Man and of the Citizen of 1789 . The DNA molecule produced can be readily cloned into a suitable host/vector system for amplification and selection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号