[Haskell-cafe] Map Reduce spec in Haskell

18 Apr 2013

      I'm learning about the Map Reduce computation frequently used with big
data.  For the fun of it, I decided to write a very high-level spec of Map
Reduce.  Here is what I came up with.  Enjoy.

John
...
module MapReduce where
import Data.List (nub)
A high-level specification of Map Reduce as a Haskell program.  The
program uses lists to represent multisets.  As multisets have no
implied ordering, the ordering implied by lists in this specification
should be ignored.

The database is a multiset of key-value pairs.
...
type Key = String
type Value = String
type Datum = (Key, Value)
type Data = [Datum]
A mapper maps a datum to a finite multiset of key-value pairs.
...
type Mapper = Datum -> Data
A reducer takes a key and a multiset of values and produces a finite
multiset of values.
...
type Reducer = (Key, [Value]) -> [Value]
A step is a mapper followed by a reducer
...
type Step = (Mapper, Reducer)
A program is a finite sequence of steps
...
type Program = [Step]
The semantics of a program is provided by the run function.
...
run :: Program -> Data -> Data
run [] d = d
run (s : p) d =
  run p (step s d)
The three parts of a step are mapping, shuffling, and reducing.
...
step :: Step -> Data -> Data
step (m, r) d =
  let mapped = transform m d
      shuffled = shuffle mapped in
  reduce r shuffled
The first part of a step is to transform the data by applying the
mapper to each datum and collecting the results.
...
transform :: Mapper -> Data -> Data
transform m d =
  [p | u <- d, p <- m u]
Next, values with common keys are collected.  Keys are unique after
shuffling.
...
shuffle :: Data -> [(Key, [Value])]
shuffle d =
  [(k, vs) | k <- nub (map fst d), -- nub eliminates duplicate keys
             let vs = [v | (k', v) <- d, k' == k]]
A reducer is applied to the data associated with one key, and always
produces data with that key.
...
reduce :: Reducer -> [(Key, [Value])] -> Data
reduce r rs =
  [(k, v) | (k, vs) <- rs, v <- r (k, vs)]

[Haskell-cafe] Map Reduce spec in Haskell

John D. Ramsdell