{Reference Type}: Journal Article
{Title}: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification.
{Author}: Marsan L;Sagot MF;
{Journal}: J Comput Biol
{Volume}: 7
{Issue}: 3
{Year}: 2000
{Factor}: 1.549
{DOI}: 10.1089/106652700750050826
{Abstract}: This paper introduces two exact algorithms for extracting conserved structured motifs from a set of DNA sequences. Structured motifs may be described as an ordered collection of p > or = 1 "boxes" (each box corresponding to one part of the structured motif), p substitution rates (one for each box) and p - 1 intervals of distance (one for each pair of successive boxes in the collection). The contents of the boxes--that is, the motifs themselves--are unknown at the start of the algorithm. This is precisely what the algorithms are meant to find. A suffix tree is used for finding such motifs. The algorithms are efficient enough to be able to infer site consensi, such as, for instance, promoter sequences or regulatory sites, from a set of unaligned sequences corresponding to the noncoding regions upstream from all genes of a genome. In particular, both algorithms time complexity scales linearly with N2n where n is the average length of the sequences and N their number. An application to the identification of promoter and regulatory consensus sequences in bacterial genomes is shown.