[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

22 Apr 2007

      Ketil Malde  writes:
...
On Sun, 2007-04-22 at 00:25 -0400, Pete Kazmier wrote:
...
Pete Kazmier  writes:
...
I'd love to see other Haskell implementations as well if anyone has a
few moments to spare.  Admittedly, it took me several hours to get my
version working, but I'm a Haskell newbie.  Unfortunately, I think it
runs as slow as it took me to write it!
Hm - nobody suggested using ByteStrings yet?  String is notoriously
wasteful, and replacing it with ByteString normally gives a quite
significant speedup.
I actually have a ByteString version but it runs much slower.  This
part of the code is where all of the time is spent in the ByteString
version:

  type WordFreq   = M.Map B.ByteString Int

  train:: [B.ByteString] -> WordFreq
  train words = frequencyMap
      where
        frequencyMap     = foldr incWordCount M.empty words
        incWordCount w m = M.insertWith (+) w 1 m
...
Worse - and this is true for ByteStrings, too - String comparisons are
O(n), which means lookups in Sets and Maps are expensive.  A trie (i.e,
a search tree where each internal node corresponds to a word prefix, and
has one branch per letter in the alphabet) will give you lookup that
scales with word size (and possibly alphabet size).
Right.  My first version was just a direct translation of Norvig's
code with an emphasis on trying to keep the complexity and size of
code to a minimum.
...
Instead of generating the (huge) list of misspelled words, you could
calculate edit distance to each correctly spelled word?  With a bound on
allowable mistakes, this is a linear time operation using the standard
dynamic programming algorithm.
Could you provide additional information on this "standard dynamic
programming algorithm?"  I'm not familiar with dynamic programming.

Thanks!
Pete