
On Jun 22, 9:10 am, Ketil Malde
Kamil Dworakowski
writes: Right... Python uses hashtables while here I have a tree with log n access time. I did not want to use the Data.HashTable, it would pervade my program with IO. The alternative is an ideal hashmap that never gets changed. This program creates a dictionary at start which then is only being used to read from: an ideal application for the Data.PerfectHash by Mark Wotton available on Hackage [3].
If you are considering alternative data structures, you might want to look at tries or Bloom filters, both have O(n) lookup, both have Haskell implementations. The latter is probably faster but probabilistic (i.e. it will occasionally fail to detect a misspelling - which you can of course check against a "real" dictionary).
Using Bryan O'Sullivan's fantastic BloomFilter I got it down below Python's run time! Now it is 35.56s, 28% of the time is spent on GC, which I think means there is still some room for improvement. Do you say that PerfectHash runs with a penalty of calling into c library?