
All, In order to teach myself Haskell, I've been tinkering with some of the Shootout (http://shootout.alioth.debian.org/great/) programs. Substantially improved the Mandelbrot program. Then started to work on the Spellcheck program, since Haskell seemed to do quite poorly at it. However, my revision appears to be slower than the original. I was hoping that I could get some help from y'all. Original program: import Data.Set main = do d <- readFile "Usr.Dict.Words" let misspelled x = not $ x `elementOf` (mkSet (lines d)) interact $ unlines . filter misspelled . Lines The original program is nice and short, but uses Set to hold the 80,000 word dictionary. The Input file is 80,000 words, too, so I figured that searching the Set dictionary would be an O(n^2) task (80,000 spellchecks x linear search on 80,000 word dictionary Set). All of the other languages used hash tables for this task (O(n) (80,000 spellchecks x hash search on dictionary), so I revised the program to do the same. ---------------------------------------- Revised program: import Data.HashTable import IO import GHC.Int buildHash :: (String -> Int32) -> String -> IO (HashTable String Bool) buildHash hash_fn list = do table <- new (==) hash_fn sequence_ [ insert table k True | k <- lines list ] return table spellcheck :: HashTable String Bool -> String -> String -> IO String spellcheck ds "" accum = return accum spellcheck ds xs accum = let (x, xs') = break (== '\n') xs in do r <- Data.HashTable.lookup ds x case r of Just b -> spellcheck ds (tail xs') accum _ -> spellcheck ds (tail xs') (x ++ accum) main = do ds <- readFile "Usr.Dict.Words" >>= buildHash hashString hGetContents stdin >>= (\xs -> spellcheck ds xs "") >>= print ---------------------------------------- The Revised program is 33% faster than the Original program using the original dataset. However, if I boost the Input to 1.4M words while keeping the dictionary the same, then the two programs are equal in speed. This suggests that loading the dictionary and building the Hash/Set is faster in the Revised program, but that _lookups_ are _slower_ in the Revised program. Is Hashtable.lookup really slower than Set.elementOf? I'm missing something and I don't know what it is. Frustrating. - Alson