
Hi John,
Thanks for creating a competitor to the iteratee library. I think iteratees
are an important abstraction, but there are some things about the iteratee
library that I'm not fond of, despite John Lato doing a great job. I think
having a bit of healthy competition to explore the design space is
excellent.
I have questions for you below.
On Wed, Aug 18, 2010 at 9:31 PM, John Millikin
Most of you have probably read Oleg's essays on using left-fold enumerators for incremental IO. In short, by encapsulating monadic left-folds in an "Iteratee" type, incremental pure processing is possible without using lazy IO. Sources to read:
[snip]
While I appreciate Mr. Lato's development of the package, I find it far too large, and its documentation too sparse, to effectively use. To correct this, I've written the "enumerator" package. It is also derived from Oleg's IterateeM.hs , but with a simplified API and significantly reduced dependency list.
I don't mind the dependency list, but I was mildly concerned that iteratee appears to work only on unix and that the API is a bit rough.
Hackage entry: http://hackage.haskell.org/package/enumerator Haddock docs: http://ianen.org/haskell/enumerator/api-docs/ Source code (literate PDF): http://ianen.org/haskell/enumerator/enumerator.pdf
darcs get http://ianen.org/haskell/enumerator/
Additionally, I've included examples of using enumerators to implement simplified versions of the "cat" and "wc" utilities. These should serve as a useful starting point for anybody who wants to use enumerators in their own code:
http://patch-tag.com/r/jmillikin/enumerator/snapshot/current/content/pretty/...
http://patch-tag.com/r/jmillikin/enumerator/snapshot/current/content/pretty/...
The main reason I would use iteratees is for performance reasons. To help me, as a potential consumer of your library, could you please provide benchmarks for comparing the performance of enumerator with say, a) iteratee, b) lazy/strict bytestring, and c) Prelude functions? I'm interested in both max memory consumption and run-times. Using criterion and/or progression to get the run-times would be icing on an already delicious cake!
There are already a few libraries using the existing "iteratee" package (snap, attoparsec-iteratee, hexpat-iteratee); I am very interested in advice from the authors of these libraries. In particular, are any of the removed features (ListLike, WrappedByteString, seeking) something your libraries depend on? Are there any useful combinators you'd like to see included?
The only reason iteratee provides WrappedByteString is because the type class used to abstract over the stream type requires something with kind * -> * and ByteString has kind *. The extra wrapping just adds an ignored phantom type to bytestrings. So if you don't require specific kinds I don't think you'd need to provide a WrappedByteString. ListLike is possibly nice, but in the type indexed iteratee implementation that I started (but could not finish due to some issues with the type indexing) I didn't use it. ListLike doesn't support type threaded lists at all. On a side note, in my type threaded iteratee library, I initially elided StreamChunk but later added something similar in because I found it useful. I can't recall of the top of my head what the reasoning was, but I could dig deeper if it interests you. I was also following a fairly faithful re-implementation of John Lato's implementation, just with type indexing. I should probably post my partial library regardless. Perhaps others can find ways around the bits I was stuck on. I can see seeking as being important as your library moves into new domains of use. Particularly when reading large binary streams when the data is sparse. Thanks and congrats! Jason