背景:长读测序技术正在成为基因组和转录组学分析中日益不可或缺的工具。特别是在转录组学中,长读段提供了对全长同工型进行测序的可能性,这可以大大简化新转录本的鉴定和转录本的定量。然而,尽管有这个承诺,迄今为止,长期阅读方法开发的重点一直放在成绩单识别上,对量化的关注相对较少。然而,由于底层协议和技术的差异,较低的吞吐量(即与短读取技术相比,每个样品测序的读数较少),以及技术文物,长读量化仍然是一个挑战,激励继续开发和评估针对这种日益普遍的数据类型定制的量化方法。
结果:我们引入了一种用于长读数转录本定量的新方法和软件工具,称为oarfish。我们的模型包含了一个新颖而创新的覆盖分数,这会影响底层概率模型中片段分配的条件概率。我们证明,通过考虑这些覆盖信息,Oarfish能够产生比现有的长读取量化方法更准确的量化估计,特别是当人们考虑存在于特定细胞系或组织类型中的主要同工型时。
方法:Oarfish是用Rust编程语言实现的,并在BSD3条款许可下作为免费和开源软件提供。源代码可在https://www上获得。github.com/COMBINE-lab/oarfish.
UNASSIGNED: Long read sequencing technology is becoming an increasingly indispensable tool in genomic and transcriptomic analysis. In transcriptomics in particular, long reads offer the possibility of sequencing full-length isoforms, which can vastly simplify the identification of novel transcripts and transcript quantification. However, despite this promise, the focus of much long read method development to date has been on transcript identification, with comparatively little attention paid to quantification. Yet, due to differences in the underlying protocols and technologies, lower throughput (i.e. fewer reads sequenced per sample compared to short read technologies), as well as technical artifacts, long read quantification remains a challenge, motivating the continued development and assessment of quantification methods tailored to this increasingly prevalent type of data.
UNASSIGNED: We introduce a new method and software tool for long read transcript quantification called oarfish. Our model incorporates a novel and innovative coverage score, which affects the conditional probability of fragment assignment in the underlying probabilistic model. We demonstrate that by accounting for this coverage information, oarfish is able to produce more accurate quantification estimates than existing long read quantification methods, particularly when one considers the primary isoforms present in a particular cell line or tissue type.
UNASSIGNED: Oarfish is implemented in the Rust programming language, and is made available as free and open-source software under the BSD 3-clause license. The source code is available at https://www.github.com/COMBINE-lab/oarfish.