Request for code review - Knuth Morris Pratt for Data.Sequence

Using the code developed for ByteStrings by myself, Christ Kuklewicz and Daniel Fischer, I've implemented Knuth-Morris-Pratt substring searching on Data.Sequence "Seq" values. Attached you'll find the library in kmp.zip.safe. The algorithm is implemented in the module Data.Sequence.KMP. At the root, SpeedTest.hs can be compiled on Windows with the "prof_compile.bat" file included (you'll need to install the "regex-dfa" and "regex-base" packages from Hackage to build). SpeedTest searches for a known value in the 7 MB file "endo.dna" (which can be downloaded from http://www.icfpcontest.org/endo.zip) using several different algorithms and methods: strict and lazy bytestrings, regular expressions, and the KMP algorithm for "Seq" values. SpeedTest is pretty fast but I worry about its space usage. It may just be the nature of Seq values, but I cannot get it to run in constant space when I think it should be. I'm especially interested in help here. All comments and feedback are welcome. Since the zip file includes a darcs context file, feel free to send patches. Justin
participants (1)
-
Justin Bailey