
Hello, I wrote a function that finds anagrams in a file. It worked great. Until I tried to use Bytestrings to get better performance. Here's the code: (http://github.com/joevandyk/haskell/blob/aa61a58e6a027dda60a32ae64ec99f92d00...) import qualified Data.Map as Map import Data.Ord import Data.List import qualified Data.ByteString.Char8 as BS -- what's the type here? I get an infinite type error anagrams :: Ord a => [a] -> [a] anagrams words = sorted_anagrams where sorted_anagrams = sortBy (flip $ comparing length) get_anagrams get_anagrams = Map.elems $ foldl' insert_word Map.empty words insert_word map word = Map.insertWith' (++) (sort word) [word] map main = do -- original code, worked fine: -- input <- getContents -- print $ take 3 $ anagrams $ lines input -- new code with bytestring has errors input <- BS.getContents print $ take 3 $ anagrams $ BS.lines input There are two errors. One: anagram.hs:7:0: Occurs check: cannot construct the infinite type: a = [a] When generalising the type(s) for `anagrams' That happens when I add the type to the anagrams method. If I don't have a type specified, then I get this error: anagram.hs:21:30: Couldn't match expected type `[a]' against inferred type `BS.ByteString' Expected type: [[a]] Inferred type: [BS.ByteString] In the second argument of `($)', namely `BS.lines input' In the second argument of `($)', namely `anagrams $ BS.lines input' I'm lost here. Help! -- Joe Van Dyk http://fixieconsulting.com

Am Dienstag 19 Januar 2010 20:40:38 schrieb Joe Van Dyk:
Hello,
I wrote a function that finds anagrams in a file. It worked great. Until I tried to use Bytestrings to get better performance.
Here's the code: (http://github.com/joevandyk/haskell/blob/aa61a58e6a027dda60a32ae64ec99f 92d00ae5ed/pearls/anagrams/anagram.hs) import qualified Data.Map as Map import Data.Ord import Data.List import qualified Data.ByteString.Char8 as BS
-- what's the type here? I get an infinite type error anagrams :: Ord a => [a] -> [a]
anagrams :: Ord a => [[a]] -> [[[a]]] But ask ghci, that's faster to answer such questions than the list :) Don't give a type signature, load the module, ghci> :t anagrams anagrams :: Ord a => [[a]] -> [[[a]]]
anagrams words = sorted_anagrams where sorted_anagrams = sortBy (flip $ comparing length) get_anagrams get_anagrams = Map.elems $ foldl' insert_word Map.empty words insert_word map word = Map.insertWith' (++) (sort word) [word] map
You sort each word from the input list, so word must have type Ord a => [a], thus the input has type Ord a => [[a]]. The map you build maps Strings ([Char], generally, Ord a => [[a]]) to lists of Strings (so Map String [String] === Map [Char] [[Char]], in general, Ord a => Map [a] [[a]]), thus Map.elems gives a list of (lists of Strings), that is [[String]] === [[[Char]]], in general, [[[a]]].
main = do -- original code, worked fine: -- input <- getContents -- print $ take 3 $ anagrams $ lines input -- new code with bytestring has errors input <- BS.getContents print $ take 3 $ anagrams $ BS.lines input
There are two errors. One: anagram.hs:7:0: Occurs check: cannot construct the infinite type: a = [a] When generalising the type(s) for `anagrams'
That happens when I add the type to the anagrams method.
If I don't have a type specified, then I get this error: anagram.hs:21:30: Couldn't match expected type `[a]' against inferred type `BS.ByteString' Expected type: [[a]] Inferred type: [BS.ByteString] In the second argument of `($)', namely `BS.lines input' In the second argument of `($)', namely `anagrams $ BS.lines input'
I'm lost here. Help!
ByteStrings are not lists, so Data.List.sort can't do anything with them. You could change one line of the the code of anagrams to insert_word map word = Map.insertWith' (++) (BS.sort word) [word] map which restricts the type of anagrams to anagrams :: [ByteString] -> [[ByteString]] But, ByteStrings sort isn't good for short ByteStrings (allocate an array of 256 slots, count how often each character occurs, write in order to new ByteString - the overhead of allocating the array is larger than the sorting cost for short ByteStrings). For ByteStrings as short as normal words in English/French/German/Italian/Spanish, it's much better to unpack the ByteStrings for sorting and change the line to insert_word map word = Map.insertWith' (++) (BS.pack . sort . BS.unpack $ word) [word] map

On Tue, Jan 19, 2010 at 12:49 PM, Daniel Fischer
Am Dienstag 19 Januar 2010 20:40:38 schrieb Joe Van Dyk:
Hello,
I wrote a function that finds anagrams in a file. It worked great. Until I tried to use Bytestrings to get better performance.
Here's the code: (http://github.com/joevandyk/haskell/blob/aa61a58e6a027dda60a32ae64ec99f 92d00ae5ed/pearls/anagrams/anagram.hs) import qualified Data.Map as Map import Data.Ord import Data.List import qualified Data.ByteString.Char8 as BS
-- what's the type here? I get an infinite type error anagrams :: Ord a => [a] -> [a]
anagrams :: Ord a => [[a]] -> [[[a]]]
But ask ghci, that's faster to answer such questions than the list :)
Don't give a type signature, load the module,
ghci> :t anagrams anagrams :: Ord a => [[a]] -> [[[a]]]
anagrams words = sorted_anagrams where sorted_anagrams = sortBy (flip $ comparing length) get_anagrams get_anagrams = Map.elems $ foldl' insert_word Map.empty words insert_word map word = Map.insertWith' (++) (sort word) [word] map
You sort each word from the input list, so word must have type
Ord a => [a],
thus the input has type
Ord a => [[a]].
The map you build maps Strings ([Char], generally, Ord a => [[a]]) to lists of Strings (so Map String [String] === Map [Char] [[Char]], in general, Ord a => Map [a] [[a]]), thus Map.elems gives a list of (lists of Strings), that is [[String]] === [[[Char]]], in general, [[[a]]].
main = do -- original code, worked fine: -- input <- getContents -- print $ take 3 $ anagrams $ lines input -- new code with bytestring has errors input <- BS.getContents print $ take 3 $ anagrams $ BS.lines input
There are two errors. One: anagram.hs:7:0: Occurs check: cannot construct the infinite type: a = [a] When generalising the type(s) for `anagrams'
That happens when I add the type to the anagrams method.
If I don't have a type specified, then I get this error: anagram.hs:21:30: Couldn't match expected type `[a]' against inferred type `BS.ByteString' Expected type: [[a]] Inferred type: [BS.ByteString] In the second argument of `($)', namely `BS.lines input' In the second argument of `($)', namely `anagrams $ BS.lines input'
I'm lost here. Help!
ByteStrings are not lists, so Data.List.sort can't do anything with them. You could change one line of the the code of anagrams to
insert_word map word = Map.insertWith' (++) (BS.sort word) [word] map
which restricts the type of anagrams to
anagrams :: [ByteString] -> [[ByteString]]
But, ByteStrings sort isn't good for short ByteStrings (allocate an array of 256 slots, count how often each character occurs, write in order to new ByteString - the overhead of allocating the array is larger than the sorting cost for short ByteStrings).
For ByteStrings as short as normal words in English/French/German/Italian/Spanish, it's much better to unpack the ByteStrings for sorting and change the line to
insert_word map word = Map.insertWith' (++) (BS.pack . sort . BS.unpack $ word) [word] map
Thanks for the response. I was getting confused -- I'm used to thinking about a String as being its own type, instead of being an array of chars. I now have: anagrams :: [BS.ByteString] -> [[BS.ByteString]] anagrams words = sorted_anagrams where sorted_anagrams = sortBy (flip $ comparing length) get_anagrams get_anagrams = Map.elems $ foldl' insert_word Map.empty words insert_word map word = Map.insertWith' (++) sorted_word [word] map sorted_word word = BS.pack . sort . BS.unpack $ word But get this error: anagram.hs:12:27: No instance for (Ord (BS.ByteString -> BS.ByteString)) arising from a use of `Map.insertWith'' at anagram.hs:12:27-69 Possible fix: add an instance declaration for (Ord (BS.ByteString -> BS.ByteString)) In the expression: Map.insertWith' (++) sorted_word [word] map In the definition of `insert_word': insert_word map word = Map.insertWith' (++) sorted_word [word] map In the definition of `anagrams': anagrams words = sorted_anagrams where sorted_anagrams = sortBy (flip $ comparing length) get_anagrams get_anagrams = Map.elems $ foldl' insert_word Map.empty words insert_word map word = Map.insertWith' (++) sorted_word [word] map sorted_word word = BS.pack . sort . BS.unpack $ word is sorted_word returning a function?

Am Dienstag 19 Januar 2010 22:34:41 schrieb Joe Van Dyk:
Thanks for the response. I was getting confused -- I'm used to thinking about a String as being its own type, instead of being an array of chars.
Lists are NOT arrays. If it was not just loose language on your side, learn the distinction. List indexing (list !! k) is O(min k (length list)), array indexing is O(1). An array knows its size (resp. it can easily be calculated from the known bounds), a list does not. length list is O(length list). If you're not aware of the differences, sooner rather than later, you'll get bitten by a gigantic performance bug.
I now have:
anagrams :: [BS.ByteString] -> [[BS.ByteString]] anagrams words = sorted_anagrams where sorted_anagrams = sortBy (flip $ comparing length) get_anagrams get_anagrams = Map.elems $ foldl' insert_word Map.empty words insert_word map word = Map.insertWith' (++) sorted_word [word] map sorted_word word = BS.pack . sort . BS.unpack $ word
But get this error:
anagram.hs:12:27: No instance for (Ord (BS.ByteString -> BS.ByteString)) arising from a use of `Map.insertWith'' at anagram.hs:12:27-69 Possible fix: add an instance declaration for (Ord (BS.ByteString -> BS.ByteString)) In the expression: Map.insertWith' (++) sorted_word [word] map In the definition of `insert_word': insert_word map word = Map.insertWith' (++) sorted_word [word] map In the definition of `anagrams': anagrams words = sorted_anagrams where sorted_anagrams = sortBy (flip $ comparing length) get_anagrams get_anagrams = Map.elems $ foldl' insert_word Map.empty words insert_word map word = Map.insertWith' (++) sorted_word [word] map sorted_word word = BS.pack . sort . BS.unpack $ word
is sorted_word returning a function?
sorted_word *is* a function. What you want in insert_word is (sorted_word word), i.e. insert_word map word = Map.insertWith' (++) (sorted_word word) [word] map Since you forgot to apply sorted_word to the ByteString, it tries to use the function sorted_word :: ByteString -> ByteString as a key, but it can't find the instance Ord (ByteString -> ByteString) where ...

On Tue, Jan 19, 2010 at 2:11 PM, Daniel Fischer
Am Dienstag 19 Januar 2010 22:34:41 schrieb Joe Van Dyk:
Thanks for the response. I was getting confused -- I'm used to thinking about a String as being its own type, instead of being an array of chars.
Lists are NOT arrays.
If it was not just loose language on your side, learn the distinction. List indexing (list !! k) is O(min k (length list)), array indexing is O(1). An array knows its size (resp. it can easily be calculated from the known bounds), a list does not. length list is O(length list).
If you're not aware of the differences, sooner rather than later, you'll get bitten by a gigantic performance bug.
Whoops, yes, I know that -- I'm just used to languages where it's not as big a deal. :D Joe
participants (2)
-
Daniel Fischer
-
Joe Van Dyk