Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

19 Aug 2010

      On Wed, Aug 18, 2010 at 23:33, Jason Dagit  wrote:
...
The main reason I would use iteratees is for performance reasons.  To help
me, as a potential consumer of your library, could you please provide
benchmarks for comparing the performance of enumerator with say, a)
iteratee, b) lazy/strict bytestring, and c) Prelude functions?
I'm interested in both max memory consumption and run-times.  Using
criterion and/or progression to get the run-times would be icing on an
already delicious cake!
Oleg has some benchmarks of his implementation at <
http://okmij.org/ftp/Haskell/Iteratee/Lazy-vs-correct.txt >, which
clock iteratees at about twice as fast as lazy IO. He also compares
them to a native "wc", but his comparison is flawed, because he's
comparing a String iteratee vs byte-based wc.

I'll benchmark my "wc" and "cat" against common alternative
implementations. My expectation it that they will be much slower than
buffers, slightly slower than strict bytestrings, and faster than lazy
bytestrings.

One of the large advantages iteratees have over lazy IO is that space
use is very predictable. While exact numbers depend on the enumerator
and iteratee, they are typically small and constant. For example,
enumFile uses a 4096-byte buffer which is copied to a ByteString[1],
so "cat" will use only about 10 KiB for a file copy. enumHandle lets
this value be tuned, depending on whether you'd like smaller space use
or fewer buffer reads.

[1] I don't know why this is done -- the reuse buffer/copy idiom is
present in Oleg's code, but I suspect just using B.hGet will be more
efficient. I'll do some benchmarks to confirm.
...
ListLike is possibly nice, but in the type indexed iteratee implementation
that I started (but could not finish due to some issues with the type
indexing) I didn't use it.  ListLike doesn't support type threaded lists at
all.  On a side note, in my type threaded iteratee library, I initially
elided StreamChunk but later added something similar in because I found it
useful.  I can't recall of the top of my head what the reasoning was, but I
could dig deeper if it interests you.  I was also following a fairly
faithful re-implementation of John Lato's implementation, just with type
indexing.  I should probably post my partial library regardless.  Perhaps
others can find ways around the bits I was stuck on.
If you can recall the reasoning behind using ListLike or StreamChunk,
it would be useful. Their advantages over simply using lists is not
obvious to me.
...
I can see seeking as being important as your library moves into new domains
of use.  Particularly when reading large binary streams when the data is
sparse.
Though I don't have any personal experience writing Haskell parsers
for sparsely-populated files, I suspect that folds are poorly adapted
to seeking. It will probably be more efficient to implement your own
enumerator or enumeratee, which contains logic for skipping
uninteresting portions of the file.

Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

John Millikin