
Suffix trees are a data structure used to search for a substring of length "m" in a string of length "n" in O(m) time. Suffix trees can also be used for efficient approximate searches. This data structure is of particular importance in bioinformatics. Does anyone have any Haskell code implementing suffix trees? I'm particularly interested in high-performance construction. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. OCaml for Scientists http://www.ffconsultancy.com/products/ocaml_for_scientists/?e

On Sun, 2007-08-12 at 23:09 +0100, Jon Harrop wrote:
Suffix trees are a data structure used to search for a substring of length "m" in a string of length "n" in O(m) time. Suffix trees can also be used for efficient approximate searches. This data structure is of particular importance in bioinformatics.
Suffix trees (and their sibling, suffix arrays) are cool. Basically a suffix tree is a trie, but where any nodes with only one child are collapsed. Easy to construct through sorting, linear time is trickier, and perhaps the biggest problem is keeping memory use reasonable. Suffix arrays is a sorted array of indices, so it's more compact, but takes O(m log n) to search. There's also the enhanced suffix array, which provides the same functionality as the suffix tree, but uses approximately the same space as well (something like 12n bytes vs 5n bytes, IIRC).
Does anyone have any Haskell code implementing suffix trees? I'm particularly interested in high-performance construction.
I was using a suffix tree to index some data, but I discovered I only needed the tree to constant depth, so - shame on me - I ended up simply sorting the suffixes in one case, and storing hashed words in a Map in another. I've also toyed with FFI'ing to comressed suffix arrays, but in the end, this wasn't as successful as I'd hoped. For straightforward suffix array construction, this is probably the fastest way to go about it, though. Linear time suffix trees are a bit complicated to implement in a functional language, as they depend on additional links in the tree during construction. I'm sure this can be done using IO/STrefs, but it would of course be cooler if it could be done tying-the-knot style. Anyway, I have a handful of shaky FFI bindings, a reasonable bibtex file, and lots of good will and encouragement. Let me know if you have use for any of them! -k

G'day all.
Quoting Jon Harrop
Does anyone have any Haskell code implementing suffix trees? I'm particularly interested in high-performance construction.
No, but I have this: http://citeseer.ist.psu.edu/giegerich95comparison.html First person to implement them gets the eternal gratitude of hackage. Cheers, Andrew Bromage
participants (3)
-
ajb@spamcop.net
-
Jon Harrop
-
Ketil Malde