Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

22 Apr 2007

      I try using WordSet = [String] (plus corresponding change in code) and 
get great speedup, actually way more than 3x. There was also a memory 
growth phenomenon using Set String, and replacement by [String] stops 
that too, now it's constant space (constant = 20M). It is possible to 
attribute part of the speedup to excellent rewrite rules in GHC 
regarding lists; however, I cannot explain the memory growth when using Set.

Regarding the local WordFreq map under "train", I am shocked that ghc -O 
is smart enough to notice it and perform proper sharing, and only one 
copy is ever created. Nonetheless, I still decide to factor "train" into 
two, one builds the WordFreq and the other queries it, which eases blame 
analysis when necessary.

On the interact line, I use "tokens" to break up the input, since it's 
already written (for the trainer), may as well reuse it.

When reading holmes.txt, be aware that it is in UTF-8, while GHC still 
assumes ISO-8859-1. This will affect results.

I have not checked the correctness of edits1.

I am monochrom.

My modification is attached.

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

Albert Y. C. Lai