On Sun, Dec 25, 2011 at 11:25 AM, Heinrich Apfelmus <apfelmus@quantentunnel.de> wrote:

Your StreamSummary type has a really nice interpretation: it's a reification of case expressions [on lists].

nice observation!

For instance, consider the following simple function from lists to integers

length :: [a] -> Int
length xs = case xs of
[] -> 0
(y:ys) -> 1 + length ys

We want to reify the case expression as constructor of a data type. [...]

data ListTo a r = CaseOf r (a -> ListTo a r)

interpret :: ListTo a r -> ([a] -> r)
interpret (CaseOf nil cons) xs =
case xs of
[] -> nil
(y:ys) -> interpret (cons y) ys

[...]

Likewise, each function from lists can be represented in terms of our new data type [...]

length' :: ListTo a Int
length' = CaseOf
(0)
(\x -> fmap (1+) length')

length = interpret length'

This version of `length` is tail recursive while the previous version is not. In general, all functions defined in terms of `ListTo` and `interpret` are spine strict - they return a result only after consuming all input list constructors.

This is what Eugene observed when defining the identity function as

idC = CaseOf [] (\x -> (x:) <$> idC)

This version does not work for infinite lists. Similarly, `head` and `take` cannot be defined as lazily as in the standard libraries.

We can support lazier list consumers by adding a case to the ListTo type that allows to stop consuming the list. To avoid confusion, I chose new names for my new types.

data ListConsumer a b

= Done !b

| Continue !b (a -> ListConsumer a b)

The interpretation function just ignores the remaining input in the case of `Done`:

consumeList :: ListConsumer a b -> [a] -> b

consumeList (Done b) _ = b

consumeList (Continue b _) [] = b

consumeList (Continue _ f) (x:xs) = consumeList (f x) xs

We can define lazier versions of `head` and `take` as follows:

headC :: ListConsumer a a

headC = Continue (error "head of empty list") Done

takeC :: Int -> ListConsumer a [a]

takeC 0 = Done []

takeC n = Continue [] (\x -> (x:) <$> takeC (n-1))

However, we still cannot define a lazy version of the identity funtion with list consumers.

The identity function and `takeC` belong to a special case of a list consumers because they also *produce* lists. We can define a specialized type for list transformers that consume and produce lists. One advantage of this specialization will be that we can define a lazy version of the identity function. The transformer type can have functor and applicative instances similar to the consumer type to compose transformers in parallel. Additionally, it can have category and arrow instances to compose transformers sequentially.

Here is a type for lazy list transformers:

data ListTransformer a b

= Cut

| Put b (ListTransformer a b)

| Get (a -> ListTransformer a b)

A transformer can either cut off the input list and return the empty list, output a new element before transforming the input, or consume one element from the input list and transform the remaining elements. The interpretation function should make this clearer:

transformList :: ListTransformer a b -> [a] -> [b]

transformList Cut _ = []

transformList (Put b t) xs = b : transformList t xs

transformList (Get _) [] = []

transformList (Get f) (x:xs) = transformList (f x) xs

Note that, if the transformer wants to read another element that is not there, it simply returns the empty list.

Now we can define a lazy identity function and another version of `take`:

idT :: ListTransformer a a

idT = Get (\x -> Put x idT)

takeT :: Int -> ListTransformer a a

takeT 0 = Cut

takeT n = Get (\x -> Put x (takeT (n-1)))

Here is another translation of a common list function:

filterT :: (a -> Bool) -> ListTransformer a a

filterT p = Get (\x -> if p x then Put x (filterT p) else filterT p)

`filterT` is an example for a function that can consume several input elements before producing an output element. Other examples for functions of this kind are chunking functions:

pairsT :: ListTransformer a (a,a)

pairsT = Get (\x -> Get (\y -> Put (x,y) pairsT))

chunks :: Int -> ListTransformer a [a]

chunks n = collect n

where

collect 0 = Put [] (chunks n)

collect m = Get (\x -> collect (m-1) >>> Get (\xs -> Put (x:xs) id))

Here are some example calls in GHCi that demonstrate the category and applicative instances for sequential and parallel composition (see below for a link to the complete source code):

ghci> transformList pairsT [1..5]

[(1,2),(3,4)] -- 5 is ignored

ghci> transformList pairsT [1..6]

[(1,2),(3,4),(5,6)]

ghci> transformList (chunks 2) [1..5]

[[1,2],[3,4]] -- similar to pairsT

ghci> transformList (chunks 3) [1..6]

[[1,2,3],[4,5,6]]

ghci> transformList (takeT 3 . chunks 3) [1..] -- infinite input

[[1,2,3],[4,5,6],[7,8,9]]

ghci> transformList ((,) <$> takeT 3 . chunks 3 <*> chunks 2 . filterT even) [1..]

[([1,2,3],[2,4]),([4,5,6],[6,8]),([7,8,9],[10,12])]

When we compose transformers in parallel, the memory requirements depend on how far they run out of sync. If they consume elements at the same pace, memory requirements should be constant. Otherwise, some part of the input is retained to satisfy all transformers. In the final call above bigger and bigger parts are retained because the first transformer is slower than the second.

As transformers are a special case of consumers, we can compose a consumer and a transformer to give a new consumer:

(<.) :: ListConsumer b c -> ListTransformer a b -> ListConsumer a c

Done c <. _ = Done c

Continue c _ <. Cut = Done c

Continue _ f <. Put x t = f x <. t

Continue c f <. Get g = Continue c (\a -> Continue c f <. g a)

Using the instances for parallel and sequential transformer composition as well as the instances for parallel consumer composition, we can build complex consumers that still execute in lockstep and consume their input lazily.

Sebastian

P.S. https://gist.github.com/1521467