[Haskell-cafe] Re: Optimizing spelling correction program

23 Jun 2009

      On Jun 22, 9:10 am, Ketil Malde  wrote:
...
Kamil Dworakowski  writes:
...
Right... Python uses hashtables while here I have a tree with log n
access time. I did not want to use the Data.HashTable, it would
pervade my program with IO. The alternative is an ideal hashmap that never
gets changed. This program creates a dictionary at start which then is only
being used to read from: an ideal application for the Data.PerfectHash
by Mark Wotton available on Hackage [3].
If you are considering alternative data structures, you might want to
look at tries or Bloom filters, both have O(n) lookup, both have
Haskell implementations.  The latter is probably faster but
probabilistic (i.e. it will occasionally fail to detect a
misspelling - which you can of course check against a "real"
dictionary).
Using Bryan O'Sullivan's fantastic BloomFilter I got it down below
Python's run time! Now it is 35.56s, 28% of the time is spent on GC,
which I think means there is still some room for improvement.

Do you say that PerfectHash runs with a penalty of calling into c
library?