
Hello, We recently had a challenge as follows: Given a word, find all the words in the dictionary which can be made from the letters of that word. A letter can be used at most as many times as it appears in the input word. So, "letter" can only match words with 0, 1, or 2 t's in them. I opted for simplicity in my implementation including hard coding the input word and using /usr/share/dict/words as the dictionary: -- Begin Words.hs module Main where import List -- I was lazy and borrowed perms from the Haskell wiki -- but I wrote everything else perms [] = [[]] perms xs = [ x : ps | x <- xs, ps <- perms (xs\\[x]) ] -- creates permutations of all lengths then cleans up duplicates and gets rid of the -- empty list, this is probably the least efficient way possible allPerms x = drop 1 $ sort $ nub $ concatMap inits $ perms x main = do wordList <- readFile "/usr/share/dict/words" let words = lines wordList mapM_ print $ filter (`elem` words) $ allPerms "ubuntu" -- End Words.hs Next I decided to try it with byte stings: -- Begin ByteStringWords.hs module Main where import List -- import qualified Data.ByteString.Char8 as C import qualified Data.ByteString.Lazy.Char8 as C -- I was lazy and borrowed perms from the Haskell wiki -- but I wrote everything else perms [] = [[]] perms xs = [ x : ps | x <- xs, ps <- perms (xs\\[x]) ] -- creates permutations of all lengths then cleans up duplicates and gets rid of the -- empty list, this is probably the least efficient way possible allPerms x = drop 1 $ sort $ nub $ concatMap inits $ perms x main = do wordList <- C.readFile "/usr/share/dict/words" let words = C.lines wordList mapM_ print $ filter (`elem` words) $ map C.pack $ allPerms "ubuntu" -- End ByteStringWords.hs I don't think the overhead to compute the permutations matters here as the input to the permutations calculation is so small. Any ideas why the byte string version is slower? (Strict bytestrings appear to be about 2 seconds slower and lazy bytestrings appear to be about 1 second slower). I think, given my simple algorithm that means that (==) for ByteStrings is slower than (==) for String. Is this possible? I think the program might be spending more time cleaning up after execution with the ByteString versions as it seems to "stall" after printing the last match. Thanks, Jason