Re: [Haskell-cafe] stream/bytestring questions

18 Feb 2008

      chad.scherrer:
...
ByteStrings have given a real performance boost to a lot of Haskell
applications, and I'm curious why some of the techniques aren't more
abstracted and widely available. If it's "because it's a big job",
that's certainly understandable, but maybe there's something I'm
overlooking that some of the abstractions aren't really desirable.
My first question is about the stream fusion. This seems (based on the
ByteString paper) to speed things up quite a lot, but for some reason
this isn't necessarily the case for lists. Is this just a matter of
yeah, with lists, as compared to bytestrings, there are:

    * more complex operations to fuse
    * allocation is much cheaper (lazy list cons nodes)
    * built in desugaring for build/foldr fusion interferes (enumerations, comprehensions)

so the benefits of fusing lists are less obvious than bytestrings, where
every fusion point knocks out a big array allocation, and they're a bit
more complex to get the full api going.
...
the existing fusion for lists doing such a good job in many cases that
it's hard to beat? The streams paper suggests that some contructors
aren't optimized away during fusion, but wouldn't this also be a
problem for fusion in the bytestring context? Are there many cases
probably not, as the issues are mostly to do with deeply nested
comprehensions, which simply aren't done with bytestrings.
...
where it helps performance to program to streams directly, rather than
no, using the rules should be fine. you're not supposed to program in
the stream abstraction.
...
letting the rules go back and forth between them and lists? I tried to
do this, but kept getting hung up on the Unlifted stuff; it's not
exposed (pardon the pun) in the API, and I don't understand why the
types are different than the usual (), Either a b, (a,b), etc.
Second, the idea of representing a sequence as a lazy list of strict
arrays is very appealing. But why represent it so concretely? Wouldn't
it make sense to do something like
data ArrayList a i e = Empty | Chunk !(a i e) (ArrayList a i e)
?
someone could do that. we chose to  go with the monomorphic case, since
its easier to get the api right, and the performance right.
...
Then array fusion could be written for IArray types in general, and
the ByteString library would become a set of instances with some
specialization pragmas. Presumably a more general StorableVector could
be represented as an IArray, and the NDP stuff seems a good fit too,
so this would make it easy to leverage future work. Seems to separate
the various concerns a little better, too, but again maybe there's a
performance issue I'm missing.
Sorry for the barrage of questions; I guess there's just a lot I'm
trying to understand better.
Sure. Hope that clears things up.

Re: [Haskell-cafe] stream/bytestring questions

Don Stewart