[ANN] An efficient lazy suffix tree library

I just posted a library named suffixtree to Hackage. http://www.serpentine.com/software/suffixtree/ It implements Giegerich and Kurtz's lazy construction algorithm, with a few tweaks for better performance and resource usage. API docs: http://darcs.serpentine.com/suffixtree/dist/doc/html/Data-SuffixTree.html I've tested it on multi-megabyte input strings.

Bryan O'Sullivan wrote:
I just posted a library named suffixtree to Hackage.
http://www.serpentine.com/software/suffixtree/
It implements Giegerich and Kurtz's lazy construction algorithm, with a few tweaks for better performance and resource usage.
API docs:
http://darcs.serpentine.com/suffixtree/dist/doc/html/Data-SuffixTree.html
I've tested it on multi-megabyte input strings.
I think I found a bug: import qualified Data.SuffixTree as T
T.countRepeats "ab" (T.construct "abab") 1

Gleb Alexeyev wrote:
Bryan O'Sullivan wrote:
I just posted a library named suffixtree to Hackage.
http://www.serpentine.com/software/suffixtree/
It implements Giegerich and Kurtz's lazy construction algorithm, with a few tweaks for better performance and resource usage.
API docs:
http://darcs.serpentine.com/suffixtree/dist/doc/html/Data-SuffixTree.html
I've tested it on multi-megabyte input strings.
I think I found a bug: import qualified Data.SuffixTree as T
T.countRepeats "ab" (T.construct "abab") 1
That is almost certainly because the algorithm expects the source string to have a unique character at its end. The library should either make that clear or add such a character on its own. Otherwise the "ab" suffix is a prefix of the "abab" suffix and the shorter one gets lost. If you end in "$" then "ab$" cannot merge with "abab$" and there are no distinct suffixes a and b for which (isPrefixOf a b) is true. Example:
*Data.SuffixTree> countRepeats "ab" (construct "abab") 1 *Data.SuffixTree> countRepeats "ab" (construct "abab$") 2
-- Chris

Bryan O'Sullivan wrote:
ChrisK wrote:
That is almost certainly because the algorithm expects the source string to have a unique character at its end.
Chris is correct. I'll ensure that the docs make this clear.
Apologies, I should have thought of this myself. Thanks.
participants (3)
-
Bryan O'Sullivan
-
ChrisK
-
Gleb Alexeyev