RESULTS: We develop Forseti, a predictive model to probabilistically assign a splicing status to scRNA-seq reads. Our model has two key components. First, we train a binding affinity model to assign a probability that a given transcriptomic site is used in fragment generation. Second, we fit a robust fragment length distribution model that generalizes well across datasets deriving from different species and tissue types. Forseti combines these two trained models to predict the splicing status of the molecule of origin of reads by scoring putative fragments that associate each alignment of sequenced reads with proximate potential priming sites. Using both simulated and experimental data, we show that our model can precisely predict the splicing status of many reads and identify the true gene origin of multi-gene mapped reads.
METHODS: Forseti and the code used for producing the results are available at https://github.com/COMBINE-lab/forseti under a BSD 3-clause license.
结果:我们开发了Forseti,一个预测模型,以概率方式将剪接状态分配给scRNA-seq读取。我们的模型有两个关键组成部分。首先,我们训练了一个结合亲和力模型,以指定在片段生成中使用给定转录组位点的概率。第二,我们拟合了一个强大的片段长度分布模型,该模型可以很好地推广来自不同物种和组织类型的数据集。Forseti组合这两个训练模型以通过对推定的片段进行评分来预测读段起源分子的剪接状态,该推定的片段将测序读段的每个比对与最接近的潜在引发位点相关联。利用模拟和实验数据,我们表明,我们的模型可以精确地预测许多读段的剪接状态,并确定多基因定位读段的真实基因起源。
方法:Forseti和用于生成结果的代码可在https://github.com/COMBINE-lab/forseti上获得BSD3-clause许可证。