[Haskell-cafe] Stream-fusion without the lists

12 May 2009

      Hi everyone,

With stream-fusion, we can write functions which construct and
destruct lists, such as (this is the main example from the Stream
Fusion paper[1])
  f :: Int -> Int
  f n = sum [k * m | k <- [1..n], m <- [1..k]]
and the rewrite rules in stream-fusion replace these operations on
lists with fusible operations on the Stream (non-recursive) datatype.

In this example, the domain and codomain of the function don't mention
the list datatype. There seem to be many functions like this: the
lists are being used internally as composable loops rather than as
data-structures.

The Stream datatype seems to be much better suited to representing
loops than the list datatype is. So, instead of programming with the
lists, why don't we just use the Stream datatype directly?

For example:
In Data.Stream (from the stream-fusion package) we can find most of
the Prelude list functions but with Stream in all the types instead of
[]. For example,

Data.Stream.sum :: Num a => S.Stream a -> a

Using this module, we can rewrite f without mentioning lists. We first
need a Monad instance for Data.Stream.Stream:
...
import qualified Data.List.Stream as S
instance Monad S.Stream where
   return = S.return
   (>>=) = S.concatMap
Now we can write
...
f :: Int -> Int
f n = S.sum $ do
   k <- S.enumFromToInt 1 n
   m <- S.enumFromToInt 1 k
   return (k*m)
which is essentially the same as the original f, although lacking the
syntactic sugar of the list comprehension.

To me, it seems that Stream is more sensible data type to use than []
when the algorithm being expressed is just a loop (rather than
something which needs a [] as a cache/buffer), for the following
reasons:

1. If I am programming with lists and I need some function which I
can't express with Prelude functions, I have to write it myself. I
will probably lose fusion in this case, because I will write it
recursively in terms of lists. On the other hand, if I am programming
with Streams, I will write it myself in terms of Streams, and it
should be easier to maintain fusion because it won't be recursive.

2. Holding on to a [] too long can cause a space leak. This is not the
case for Stream, because a Stream only ever contains one "state". More
generally, the memory use of Stream is more easily predictable than
that of [], since "running" a Stream only holds on to one "state" at a
time, and we often know how big the "state" is.

3. Fusion doesn't rely on rewrite rules firing. I consider this point
less significant than the other two.

So, thoughts? Do people program with Streams directly? What have I not
considered?

Cheers,
Reiner

[1] http://www.cse.unsw.edu.au/~dons/papers/stream-fusion.pdf

[Haskell-cafe] Stream-fusion without the lists

Reiner Pope