
Hello there, I'm new to Haskell and want to learn it a bit while experimenting with Hadoop's Map/Reduce programming model. So, I wanted to implement the standard ‘word counter’ problem in Haskell. The problem is as follows: We have several texts with words separated by white-space. We want to count the occurrences of all words in all the texts (where ‘but’ and ‘but,’ can be seen as two different words). This is done in two phases. In the Map phase a program gets a part of the text from stdout and has to produce a "KEY\tVALUE" pair (that is, the key separated with a tab to the value), which has to be passed to stdin. In our case we simply produce "WordX\t1" for every word WordX. This list is sorted by the key and later on fed as stdin to the reducer (the second phase). The reducer now has to sum up all the occurrences we trivially counted in the Map phase and put it as "WordX\tNumber" to stdout. So, here is an example: vince@roku:~/tmp echo "foo foo bar bar foo bar zoo bar foo" | runhaskell mapper.hs foo 1 foo 1 bar 1 bar 1 foo 1 bar 1 zoo 1 bar 1 foo 1 vince@roku:~/tmp echo "foo foo bar bar foo bar zoo bar foo" | runhaskell mapper.hs | sort | runhaskell reducer.hs bar 4 foo 4 zoo 1 And here is the code I've come up with: vince@roku:~/tmp cat mapper.hs import qualified Data.ByteString.Lazy.Char8 as C postFix :: C.ByteString postFix = C.pack "\t1" formatter :: C.ByteString -> C.ByteString formatter x = C.append x postFix main :: IO () main = do contents <- fmap C.words C.getContents C.putStr . C.unlines $ map formatter contents vince@roku:~/tmp cat reducer.hs import qualified Data.ByteString.Lazy.Char8 as C import qualified Data.List as L tuppleize :: String -> (String, Int) tuppleize line = (\xs -> (head xs, read (last xs))) $ words line group :: Eq a => [(a, b)] -> [[(a, b)]] group = L.groupBy (\x y -> fst x == fst y) summation :: Num b => [(a, b)] -> (a, b) summation (x:[]) = x summation (x:xs) = (fst x, (snd x) + (snd (summation xs))) formatter :: (String, Int) -> String formatter = (\w -> (fst w ++ "\t" ++ show (snd w))) main = do contents <- C.getContents putStr . unlines $ map formatter $ map summation $ group $ map tuppleize $ lines $ C.unpack contents As already said, I'm a Haskell beginner. Could you provide some comments on the code? Thanks in advance, Thomas.