motif

主题
  • 文章类型: Journal Article
    序列分析通常需要直观的理解和方便的基序表示。通常,主题表示为位置权重矩阵(PWM),并使用序列徽标进行可视化。然而,在许多情况下,为了解释主题信息或搜索主题匹配,它是紧凑的,足以通过通配符式共有序列(例如[GC][AT]GATAAG[GAC])表示基序。基于互信息理论和詹森-香农分歧,我们提出了一个数学框架,以最大程度地减少将PWM转换为共有序列时的信息损失。我们将此表示命名为序列Motto,并实现了一种有效的算法,该算法具有灵活的选项,可将基序PWM从核苷酸转换为Motto,氨基酸,和定制的字符。我们表明,这种表示提供了一种简单有效的方法来鉴定人类基因组中1156种常见转录因子(TF)的结合位点。通过将Motto发现的序列匹配与FIMO发现的PWM扫描结果进行比较来确定该方法的有效性。平均而言,我们的方法在精确召回曲线下达到0.81面积,显著(P值<0.01)优于所有现有方法,包括最大位置重量,Cavener\的方法,和最小均方误差。我们相信这个表示提供了一个主题的提炼摘要,以及统计上的理由。
    Sequence analysis frequently requires intuitive understanding and convenient representation of motifs. Typically, motifs are represented as position weight matrices (PWMs) and visualized using sequence logos. However, in many scenarios, in order to interpret the motif information or search for motif matches, it is compact and sufficient to represent motifs by wildcard-style consensus sequences (such as [GC][AT]GATAAG[GAC]). Based on mutual information theory and Jensen-Shannon divergence, we propose a mathematical framework to minimize the information loss in converting PWMs to consensus sequences. We name this representation as sequence Motto and have implemented an efficient algorithm with flexible options for converting motif PWMs into Motto from nucleotides, amino acids, and customized characters. We show that this representation provides a simple and efficient way to identify the binding sites of 1156 common transcription factors (TFs) in the human genome. The effectiveness of the method was benchmarked by comparing sequence matches found by Motto with PWM scanning results found by FIMO. On average, our method achieves a 0.81 area under the precision-recall curve, significantly (P-value < 0.01) outperforming all existing methods, including maximal positional weight, Cavener\'s method, and minimal mean square error. We believe this representation provides a distilled summary of a motif, as well as the statistical justification.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号