
Ketil Malde
On Sun, 2007-04-22 at 00:25 -0400, Pete Kazmier wrote:
Pete Kazmier
writes: I'd love to see other Haskell implementations as well if anyone has a few moments to spare. Admittedly, it took me several hours to get my version working, but I'm a Haskell newbie. Unfortunately, I think it runs as slow as it took me to write it!
Hm - nobody suggested using ByteStrings yet? String is notoriously wasteful, and replacing it with ByteString normally gives a quite significant speedup.
I actually have a ByteString version but it runs much slower. This part of the code is where all of the time is spent in the ByteString version: type WordFreq = M.Map B.ByteString Int train:: [B.ByteString] -> WordFreq train words = frequencyMap where frequencyMap = foldr incWordCount M.empty words incWordCount w m = M.insertWith (+) w 1 m
Worse - and this is true for ByteStrings, too - String comparisons are O(n), which means lookups in Sets and Maps are expensive. A trie (i.e, a search tree where each internal node corresponds to a word prefix, and has one branch per letter in the alphabet) will give you lookup that scales with word size (and possibly alphabet size).
Right. My first version was just a direct translation of Norvig's code with an emphasis on trying to keep the complexity and size of code to a minimum.
Instead of generating the (huge) list of misspelled words, you could calculate edit distance to each correctly spelled word? With a bound on allowable mistakes, this is a linear time operation using the standard dynamic programming algorithm.
Could you provide additional information on this "standard dynamic programming algorithm?" I'm not familiar with dynamic programming. Thanks! Pete