
On Wed, Aug 18, 2010 at 23:33, Jason Dagit
The main reason I would use iteratees is for performance reasons. To help me, as a potential consumer of your library, could you please provide benchmarks for comparing the performance of enumerator with say, a) iteratee, b) lazy/strict bytestring, and c) Prelude functions? I'm interested in both max memory consumption and run-times. Using criterion and/or progression to get the run-times would be icing on an already delicious cake!
Oleg has some benchmarks of his implementation at < http://okmij.org/ftp/Haskell/Iteratee/Lazy-vs-correct.txt >, which clock iteratees at about twice as fast as lazy IO. He also compares them to a native "wc", but his comparison is flawed, because he's comparing a String iteratee vs byte-based wc. I'll benchmark my "wc" and "cat" against common alternative implementations. My expectation it that they will be much slower than buffers, slightly slower than strict bytestrings, and faster than lazy bytestrings. One of the large advantages iteratees have over lazy IO is that space use is very predictable. While exact numbers depend on the enumerator and iteratee, they are typically small and constant. For example, enumFile uses a 4096-byte buffer which is copied to a ByteString[1], so "cat" will use only about 10 KiB for a file copy. enumHandle lets this value be tuned, depending on whether you'd like smaller space use or fewer buffer reads. [1] I don't know why this is done -- the reuse buffer/copy idiom is present in Oleg's code, but I suspect just using B.hGet will be more efficient. I'll do some benchmarks to confirm.
ListLike is possibly nice, but in the type indexed iteratee implementation that I started (but could not finish due to some issues with the type indexing) I didn't use it. ListLike doesn't support type threaded lists at all. On a side note, in my type threaded iteratee library, I initially elided StreamChunk but later added something similar in because I found it useful. I can't recall of the top of my head what the reasoning was, but I could dig deeper if it interests you. I was also following a fairly faithful re-implementation of John Lato's implementation, just with type indexing. I should probably post my partial library regardless. Perhaps others can find ways around the bits I was stuck on.
If you can recall the reasoning behind using ListLike or StreamChunk, it would be useful. Their advantages over simply using lists is not obvious to me.
I can see seeking as being important as your library moves into new domains of use. Particularly when reading large binary streams when the data is sparse.
Though I don't have any personal experience writing Haskell parsers for sparsely-populated files, I suspect that folds are poorly adapted to seeking. It will probably be more efficient to implement your own enumerator or enumeratee, which contains logic for skipping uninteresting portions of the file.