
Am Dienstag 19 Januar 2010 20:40:38 schrieb Joe Van Dyk:
Hello,
I wrote a function that finds anagrams in a file. It worked great. Until I tried to use Bytestrings to get better performance.
Here's the code: (http://github.com/joevandyk/haskell/blob/aa61a58e6a027dda60a32ae64ec99f 92d00ae5ed/pearls/anagrams/anagram.hs) import qualified Data.Map as Map import Data.Ord import Data.List import qualified Data.ByteString.Char8 as BS
-- what's the type here? I get an infinite type error anagrams :: Ord a => [a] -> [a]
anagrams :: Ord a => [[a]] -> [[[a]]] But ask ghci, that's faster to answer such questions than the list :) Don't give a type signature, load the module, ghci> :t anagrams anagrams :: Ord a => [[a]] -> [[[a]]]
anagrams words = sorted_anagrams where sorted_anagrams = sortBy (flip $ comparing length) get_anagrams get_anagrams = Map.elems $ foldl' insert_word Map.empty words insert_word map word = Map.insertWith' (++) (sort word) [word] map
You sort each word from the input list, so word must have type Ord a => [a], thus the input has type Ord a => [[a]]. The map you build maps Strings ([Char], generally, Ord a => [[a]]) to lists of Strings (so Map String [String] === Map [Char] [[Char]], in general, Ord a => Map [a] [[a]]), thus Map.elems gives a list of (lists of Strings), that is [[String]] === [[[Char]]], in general, [[[a]]].
main = do -- original code, worked fine: -- input <- getContents -- print $ take 3 $ anagrams $ lines input -- new code with bytestring has errors input <- BS.getContents print $ take 3 $ anagrams $ BS.lines input
There are two errors. One: anagram.hs:7:0: Occurs check: cannot construct the infinite type: a = [a] When generalising the type(s) for `anagrams'
That happens when I add the type to the anagrams method.
If I don't have a type specified, then I get this error: anagram.hs:21:30: Couldn't match expected type `[a]' against inferred type `BS.ByteString' Expected type: [[a]] Inferred type: [BS.ByteString] In the second argument of `($)', namely `BS.lines input' In the second argument of `($)', namely `anagrams $ BS.lines input'
I'm lost here. Help!
ByteStrings are not lists, so Data.List.sort can't do anything with them. You could change one line of the the code of anagrams to insert_word map word = Map.insertWith' (++) (BS.sort word) [word] map which restricts the type of anagrams to anagrams :: [ByteString] -> [[ByteString]] But, ByteStrings sort isn't good for short ByteStrings (allocate an array of 256 slots, count how often each character occurs, write in order to new ByteString - the overhead of allocating the array is larger than the sorting cost for short ByteStrings). For ByteStrings as short as normal words in English/French/German/Italian/Spanish, it's much better to unpack the ByteStrings for sorting and change the line to insert_word map word = Map.insertWith' (++) (BS.pack . sort . BS.unpack $ word) [word] map