Mesh : Algorithms RNA-Seq / methods Software Humans High-Throughput Nucleotide Sequencing / methods Sequence Analysis, RNA / methods

来  源:   DOI:10.1093/bioinformatics/btae215   PDF(Pubmed)

Abstract:
BACKGROUND: High-throughput RNA sequencing has become indispensable for decoding gene activities, yet the challenge of reconstructing full-length transcripts persists. Traditional single-sample assemblers frequently produce fragmented transcripts, especially in single-cell RNA-seq data. While algorithms designed for assembling multiple samples exist, they encounter various limitations.
RESULTS: We present Aletsch, a new assembler for multiple bulk or single-cell RNA-seq samples. Aletsch incorporates several algorithmic innovations, including a \"bridging\" system that can effectively integrate multiple samples to restore missed junctions in individual samples, and a new graph-decomposition algorithm that leverages \"supporting\" information across multiple samples to guide the decomposition of complex vertices. A standout feature of Aletsch is its application of a random forest model with 50 well-designed features for scoring transcripts. We demonstrate its robust adaptability across different chromosomes, datasets, and species. Our experiments, conducted on RNA-seq data from several protocols, firmly demonstrate Aletsch\'s significant outperformance over existing meta-assemblers. As an example, when measured with the partial area under the precision-recall curve (pAUC, constrained by precision), Aletsch surpasses the leading assemblers TransMeta by 22.9%-62.1% and PsiCLASS by 23.0%-175.5% on human datasets.
METHODS: Aletsch is freely available at https://github.com/Shao-Group/aletsch. Scripts that reproduce the experimental results of this manuscript is available at https://github.com/Shao-Group/aletsch-test.
摘要:
背景:高通量RNA测序已成为解码基因活动必不可少的,然而,重建全长成绩单的挑战仍然存在。传统的单样本组装者经常产生零散的转录本,特别是在单细胞RNA-seq数据中。虽然存在设计用于组装多个样本的算法,他们遇到各种限制。
结果:我们介绍Aletsch,用于多个批量或单细胞RNA-seq样品的新组装器。Aletsch结合了几个算法创新,包括一个“桥接”系统,可以有效地整合多个样本以恢复单个样本中丢失的连接,以及一种新的图分解算法,该算法利用跨多个样本的“支持”信息来指导复杂顶点的分解。Aletsch的一个突出特点是其随机森林模型的应用,该模型具有50个精心设计的特征,用于对转录本进行评分。我们证明了它在不同染色体上的强大适应性,数据集,和物种。我们的实验,对来自几个方案的RNA-SEQ数据进行了分析,坚定地证明了Aletsch的显著优于现有的元组装商。作为一个例子,当用精度-召回曲线下的部分面积测量时(pAUC,受精度约束),在人类数据集上,Aletsch比领先的装配商TransMeta高出22.9%-62.1%,比PsiCLASS高出23.0%-175.5%。
方法:Aletsch可在https://github.com/Shao-Group/aletsch免费获得。复制本手稿实验结果的脚本可在https://github.com/Shao-Group/aletsch-test获得。
公众号