Re: [Haskell-cafe] ANNOUNCE: iterIO-0.1 - iteratee-based IO with pipe operators

From: dm-list-haskell-cafe@scs.stanford.edu
At Fri, 6 May 2011 10:10:26 -0300, Felipe Almeida Lessa wrote:
So, in the enumerator vs. iterIO challenge, the only big differences I see are:
a) iterIO has a different exception handling mechanism. b) iterIO can have pure iteratees that don't touch the monad. c) iterIO's iteratees can send control messages to ther enumerators. d) iterIO's enumerators are enumeratees, but enumerator's enumerators are simpler. e) enumerator has fewer dependencies. f) enumerator uses conventional nomenclature. g) enumerator is Haskell 98, while iterIO needs many extensions (e.g. MPTC and functional dependencies).
Anything that I missed?
The bottomline: the biggest advantage I see right now in favor of iterIO is c),
I basically agree with this list, but think you are underestimating the value of a. I would rank a as the most important difference between the packages. (a also is the reason for d.)
'a' is important, but I think a lot of people underestimate the value of 'c', which is why a control system was implemented in 'iteratee'. I would argue that iteratee's control system is more powerful than you say. For example, the only reason iteratee can't implement tell is because it doesn't keep track of the position in the stream, it's relatively simple for an enumerator to return data to an iteratee using an IORef for example. And adding support to keep track of the stream position would be a pretty simple (and possibly desirable) change. But it's definitely not as sophisticated as IterIO, and probably won't become so unless I have need of those features. I like the MonadTrans implementation a lot. The vast majority of iteratees are pure, and GHC typically produces more efficient code for pure functions, so this is possibly a performance win. Although it makes something like the mutable-iter package very difficult to implement... John Lato

At Mon, 9 May 2011 17:55:17 +0100, John Lato wrote:
Felipe Almeida Lessa wrote:
> So, in the enumerator vs. iterIO challenge, the only big differences I see are: > > a) iterIO has a different exception handling mechanism. > b) iterIO can have pure iteratees that don't touch the monad. > c) iterIO's iteratees can send control messages to ther enumerators. > d) iterIO's enumerators are enumeratees, but enumerator's enumerators > are simpler. > e) enumerator has fewer dependencies. > f) enumerator uses conventional nomenclature. > g) enumerator is Haskell 98, while iterIO needs many extensions (e.g. > MPTC and functional dependencies). >
'a' is important, but I think a lot of people underestimate the value of 'c', which is why a control system was implemented in 'iteratee'. ... it's relatively simple for an enumerator to return data to an iteratee using an IORef for example.
Would you just embed IORefs for the result into an Exception type? That's actually a pretty simple solution when you can do it. It's a bit harder for my setting, because I'm using this stuff in support of a research project that doesn't make the IO Monad available to most code. I'd like to write Inums/Enumeratees that work with both the IO Monad and my own weird monads. This is admittedly a fringe problem, so IORef is probably fine for most settings. But if there's any possible way you could do it with STRefs, that would be really cool... After further thought, though, I'm still not 100% satisfied with iterIO's control mechanism. Someone earlier in this thread pointed out that my SSL module doesn't support STARTTLS particularly conveniently. I read that and decided to go add a function to make STARTTLS really convenient. What I came up with ended up using MVars to communicate the switch from the enumerator to the iteratee and was ugly enough that I did not commit it. What you really want is the ability to send both upstream and downstream control messages. Right now, I'd say iterIO has better support for upstream control messages, while iteratee has better support for downstream messages, since iteratee can just embed an Exception in a Stream. (I'm assuming you could have something like a 'Flush' exception to cause output to be flushed by an Iteratee that was for some reason buffering some.) I'm curious how this works in practice, though. What is the convention for Enumeratees receiving Exceptions they don't know about in the Stream? Are they supposed to throw the exceptions upwards (which wouldn't help), or propagate them downwards. And how do they synchronize exceptions? Suppose you have a pipeline with an Enumeratee transcoding utf8 bytes to Chars, and another implementing text compression or something that requires buffering: ByteString +--------------+ [Char] +----------+ [Char] ----------> | UTF8-DECODER | ----------> | BUFFER | --------> +--------------+ +----------+ Now say a Stream with EOF (Just Flush) arrives at the UTF8-DECODER in the middle of a multi-byte character. Do you defer the Flush until the character is complete, or let it skip ahead to the end of the previous character and immediately send it to the next state? Or, worse, propagate it back up as an uncaught exception?
And adding support to keep track of the stream position would be a pretty simple (and possibly desirable) change.
Can you explain how iteratee could keep track of the stream position? I'm not saying it's impossible, just that it's a challenging puzzle to make the types come out and I'd love to see the solution. Somehow you would need to pass the onCont continuation to itself to preserve it, and then type a gets in the way because it's possibly no longer the right type. In other words, you could try something like: {-# LANGUAGE Rank2Types #-} data Iteratee s m a = Iteratee (forall r. (a -> Stream s -> m r) -> OnCont s m a r -> m r) data OnCont s m a r = OnCont (OnCont s m a r -> (Stream s -> Iteratee s m a) -> Maybe SomeException -> m r) But now I have no way of using or unpacking the OnCont in an Iteratee that doesn't return type a, and in general a control handler has no idea what type the iteratee that threw the exception has--it's in fact likely a different type from whatever enclosing function is wrapped by a catch call. Even if you do solve the type problem, another problem is that you don't know how many times you need to call the continuation function before you stop getting buffered data and start actually causing IO to happen. Part of the reason iterIO doesn't have this problem is that iterIO's Chunk structure (which is vaguely equivalent to iteratee's Stream) is a Monoid, so it's really easy to save up multiple chunks of residual and "ungotten" data. Every Iter is passed all buffered data of its input type in its entirety (and the inner pipeline stages can actually un-transcode data to make this true across data types). But that's also what makes downstream control messages are harder, because there's no way to represent exceptions at particular points in the input stream, just an EOF marker at the very end.
I like the MonadTrans implementation a lot...
Thanks, David
participants (2)
-
dm-list-haskell-cafe@scs.stanford.edu
-
John Lato