ANNOUNCE: iterIO-0.1 - iteratee-based IO with pipe operators

Hi, everyone. I'm pleased to announce the release of a new iteratee implementation, iterIO: http://hackage.haskell.org/package/iterIO IterIO is an attempt to make iteratees easier to use through an interface based on pipeline stages reminiscent of Unix command pipelines. Particularly if you've looked at iteratees before and been intimidated, please have a look at iterIO to see if it makes them more accessible. Some aspects of iterIO that should simplify learning and using iteratees are: * Every aspect of the library is thoroughly document in haddock including numerous examples of use. * Enumerators are easy to build out of iteratees. * There is no difference between enumerators and "enumeratees" (i.e., inner pipeline stages). The former is just a type-restricted version of the latter. * Parsing combinators provide detailed error reporting and support LL(*) rather than LL(1) parsing, leading to fewer non-intuitive parsing failures. A couple of tricks avoid consuming excessive memory for backtracking. * Super-fast LL(1) parsing is also available through seamless integration with attoparsec. * A universal exception mechanism works across invocations of mtl monad transformers, thereby unifying error handling. * All pipe operators have uniform semantics, eliminating corner cases. In particular, if the writing end of a pipe fails, the reading end always gets EOF, allowing it to clean up resources. * One can catch exceptions thrown by any contiguous subset of stages in a pipeline. Moreover, enumerator exception handlers can resume downstream stages that haven't failed. * The package is full of useful iteratees and enumerators, including basic file and socket processing, parsec-like combinators, string search, zlib/gzip compression, SSL, HTTP, and "loopback" enumerator/iteratee pairs for testing a protocol implementation against itself. Please enjoy. I'd love to hear feedback. David

Sounds just terrific! Thanks!
06.05.2011, в 8:15, David Mazieres
Hi, everyone. I'm pleased to announce the release of a new iteratee implementation, iterIO:
http://hackage.haskell.org/package/iterIO
IterIO is an attempt to make iteratees easier to use through an interface based on pipeline stages reminiscent of Unix command pipelines. Particularly if you've looked at iteratees before and been intimidated, please have a look at iterIO to see if it makes them more accessible.
Some aspects of iterIO that should simplify learning and using iteratees are:
* Every aspect of the library is thoroughly document in haddock including numerous examples of use.
* Enumerators are easy to build out of iteratees.
* There is no difference between enumerators and "enumeratees" (i.e., inner pipeline stages). The former is just a type-restricted version of the latter.
* Parsing combinators provide detailed error reporting and support LL(*) rather than LL(1) parsing, leading to fewer non-intuitive parsing failures. A couple of tricks avoid consuming excessive memory for backtracking.
* Super-fast LL(1) parsing is also available through seamless integration with attoparsec.
* A universal exception mechanism works across invocations of mtl monad transformers, thereby unifying error handling.
* All pipe operators have uniform semantics, eliminating corner cases. In particular, if the writing end of a pipe fails, the reading end always gets EOF, allowing it to clean up resources.
* One can catch exceptions thrown by any contiguous subset of stages in a pipeline. Moreover, enumerator exception handlers can resume downstream stages that haven't failed.
* The package is full of useful iteratees and enumerators, including basic file and socket processing, parsec-like combinators, string search, zlib/gzip compression, SSL, HTTP, and "loopback" enumerator/iteratee pairs for testing a protocol implementation against itself.
Please enjoy. I'd love to hear feedback.
David
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi David, Re: this comment from catchI:
It is not possible to catch asynchronous exceptions, such as lazily evaluated divide-by-zero errors, the throw function, or exceptions raised by other threads using throwTo if those exceptions might arrive anywhere outside of a liftIO call.
It might be worth investigating providing a version which can catch asynchronous exceptions if the underlying monad supports it (via MonadCatchIO or something similar). One of the most interesting advantages I can see for IterIO over the other iteratee implementations is that you actually have some control over resource usage -- not being able to catch asynchronous exceptions nullifies much of that advantage. A clear use case for this is timeouts on server threads, where you typically throw a TimeoutException exception to the handling thread using "throwTo" if the timeout is exceeded. Another question re: resource cleanup: in the docs I see:
Now suppose inumHttpBody fails (most likely because it receives an EOF before reading the number of bytes specified in the Content-Length header). Because inumHttpBody is fused to handler, the failure will cause handler to receive an EOF, which will cause foldForm to fail, which will cause handleI to receive an EOF and return, which will ensure hClose runs and the file handle h is not leaked.
Once the EOFs have been processed, the exception will propagate upwards making inumHttpServer fail, which in turn will send an EOF to iter. Then the exception will cause enum to fail, after which sock will be closed. In summary, despite the complex structure of the web server, because all the components are fused together with pipe operators, corner cases like this just work with no need to worry about leaked file descriptors.
Could you go into a little bit of detail about the mechanism behind this?
Thanks!
G
--
Gregory Collins

At Fri, 6 May 2011 10:15:50 +0200, Gregory Collins wrote:
Hi David,
Re: this comment from catchI:
It is not possible to catch asynchronous exceptions, such as lazily evaluated divide-by-zero errors, the throw function, or exceptions raised by other threads using throwTo if those exceptions might arrive anywhere outside of a liftIO call.
It might be worth investigating providing a version which can catch asynchronous exceptions if the underlying monad supports it (via MonadCatchIO or something similar). One of the most interesting advantages I can see for IterIO over the other iteratee implementations is that you actually have some control over resource usage -- not being able to catch asynchronous exceptions nullifies much of that advantage. A clear use case for this is timeouts on server threads, where you typically throw a TimeoutException exception to the handling thread using "throwTo" if the timeout is exceeded.
Excellent point. There's actually a chance that iterIO already catches those kinds of exceptions, but I wasn't sure enough about how the Haskell runtime works to make that claim. I've noticed in practice that asynchronous exceptions tend to come exactly when I execute the IO >>= operation. If that's true, then since each IO >>= is wrapped in a try block, the exceptions will all be caught (well, not divide by zero, but things like throwTo, which I think are more important). One way I was thinking of implementing this was wrapping the whole execution in block, and then calling unblock (unless iterIO's own hypothetical block function is called) for every invocation of liftIO. Unfortunately, the block and unblock functions now seem to be deprecated, and the replacement mask/unmask ones would not be as amenable to this technique. However, if there's some simpler way to guarantee that >>= is the point where exceptions are thrown (and might be the case for GHC in practice), then I basically only need to update the docs. If someone with more GHC understanding could explain how asynchronous exceptions work, I'd love to hear it...
Another question re: resource cleanup: in the docs I see:
Now suppose inumHttpBody fails (most likely because it receives an EOF before reading the number of bytes specified in the Content-Length header). Because inumHttpBody is fused to handler, the failure will cause handler to receive an EOF, which will cause foldForm to fail, which will cause handleI to receive an EOF and return, which will ensure hClose runs and the file handle h is not leaked.
Once the EOFs have been processed, the exception will propagate upwards making inumHttpServer fail, which in turn will send an EOF to iter. Then the exception will cause enum to fail, after which sock will be closed. In summary, despite the complex structure of the web server, because all the components are fused together with pipe operators, corner cases like this just work with no need to worry about leaked file descriptors.
Could you go into a little bit of detail about the mechanism behind this?
Yes, absolutely. This relies on the fact that an Inum must always return its target Iter, even when the Inum fails. This invariant is ensured by the two Inum construction functions, mkInumC and mkInumM, which catch exceptions thrown by the "codec" iteratee and add in the state of the target iteratee. Now when you execute code like "inum .| iter", the immediate result of running inum is "IterR tIn m (IterR tOut m a)"--i.e., the result of an iteratee returning the result an iteratee (because Inums are iteratees, too). If the Inum failed, then the outer IterR will use the Fail constructor: Fail !IterFail !(Maybe a) !(Maybe (Chunk t)) Where the "Maybe a" will be a "Maybe (IterR tOut m b)", and, because of the Inum invariant, will be Just an actual result. .| then must translate the inner iteratee result to the appropriate return type for the Inum (since the Inum's type (IterR tIn m ...) is different from the Iter's (Iter tOut m ...)). This happens through the internal function joinR, which says: joinR (Fail e (Just i) c) = flip onDoneR (runR i) $ \r -> case r of Done a _ -> Fail e (Just a) c Fail e' a _ -> Fail e' a c _ -> error "joinR" Where the 'runR' function basically keeps feeding EOF to an Iter (and executing it's monadic actions and rejecting its control requests) until it returns a result, at which point the result's residual input can be discarded and replaced with the residual input of the Inum. David

On 06/05/2011 16:56, dm-list-haskell-cafe@scs.stanford.edu wrote:
At Fri, 6 May 2011 10:15:50 +0200, Gregory Collins wrote:
Hi David,
Re: this comment from catchI:
It is not possible to catch asynchronous exceptions, such as lazily evaluated divide-by-zero errors, the throw function, or exceptions raised by other threads using throwTo if those exceptions might arrive anywhere outside of a liftIO call.
It might be worth investigating providing a version which can catch asynchronous exceptions if the underlying monad supports it (via MonadCatchIO or something similar). One of the most interesting advantages I can see for IterIO over the other iteratee implementations is that you actually have some control over resource usage -- not being able to catch asynchronous exceptions nullifies much of that advantage. A clear use case for this is timeouts on server threads, where you typically throw a TimeoutException exception to the handling thread using "throwTo" if the timeout is exceeded.
Excellent point. There's actually a chance that iterIO already catches those kinds of exceptions, but I wasn't sure enough about how the Haskell runtime works to make that claim. I've noticed in practice that asynchronous exceptions tend to come exactly when I execute the IO>>= operation. If that's true, then since each IO>>= is wrapped in a try block, the exceptions will all be caught (well, not divide by zero, but things like throwTo, which I think are more important).
One way I was thinking of implementing this was wrapping the whole execution in block, and then calling unblock (unless iterIO's own hypothetical block function is called) for every invocation of liftIO. Unfortunately, the block and unblock functions now seem to be deprecated, and the replacement mask/unmask ones would not be as amenable to this technique.
However, if there's some simpler way to guarantee that>>= is the point where exceptions are thrown (and might be the case for GHC in practice), then I basically only need to update the docs. If someone with more GHC understanding could explain how asynchronous exceptions work, I'd love to hear it...
There's no guarantee of the form that you mention - asynchronous exceptions can occur anywhere. However, there might be a way to do what you want (disclaimer: I haven't looked at the implementation of iterIO). Control.Exception will have a new operation in 7.2.1: allowInterrupt :: IO () allowInterrupt = unsafeUnmask $ return () which allows an asynchronous exception to be thrown inside mask (until 7.2.1 you can define it yourself, unsafeUnmask comes from GHC.IO). As I like saying, mask switches from fully asynchronous mode to polling mode, and allowInterrupt is the way you poll. Cheers, Simon

At Wed, 11 May 2011 13:02:21 +0100, Simon Marlow wrote:
There's no guarantee of the form that you mention - asynchronous exceptions can occur anywhere. However, there might be a way to do what you want (disclaimer: I haven't looked at the implementation of iterIO).
Control.Exception will have a new operation in 7.2.1:
allowInterrupt :: IO () allowInterrupt = unsafeUnmask $ return ()
which allows an asynchronous exception to be thrown inside mask (until 7.2.1 you can define it yourself, unsafeUnmask comes from GHC.IO).
Ah. I didn't know about unsafeUnmask. Is unmaskAsyncExceptions# low enough overhead that it would be reasonable to wrap every invocation of liftIO in unsafeUnmask? I'm now thinking it might be reasonable to execute all liftIO actions inside unsafeUnmask (with maybe a special liftIOmasked function for those few places where you don't want asynchronous exceptions). Most of the uses of mask are because you need two or more binds to execute without interruption, e.g.: bracket before after thing = mask $ \restore -> do a <- before -- Big problem if exception happens here -- r <- restore (thing a) `onException` after a _ <- after a return r But when bind sites are the only place an exception can be thrown, things get a lot simpler. For instance, it is perfectly reasonable to write: bracket before after thing = do a <- before thing a `finallyI` after a David

At Wed, 11 May 2011 13:02:21 +0100, Simon Marlow wrote:
There's no guarantee of the form that you mention - asynchronous exceptions can occur anywhere. However, there might be a way to do what you want (disclaimer: I haven't looked at the implementation of iterIO).
Control.Exception will have a new operation in 7.2.1:
allowInterrupt :: IO () allowInterrupt = unsafeUnmask $ return ()
which allows an asynchronous exception to be thrown inside mask (until 7.2.1 you can define it yourself, unsafeUnmask comes from GHC.IO).
Ah. I didn't know about unsafeUnmask. Is unmaskAsyncExceptions# low enough overhead that it would be reasonable to wrap every invocation of liftIO in unsafeUnmask? I'm now thinking it might be reasonable to execute all liftIO actions inside unsafeUnmask (with maybe a special liftIOmasked function for those few places where you don't want asynchronous exceptions). Most of the uses of mask are because you need two or more binds to execute without interruption, e.g.: bracket before after thing = mask $ \restore -> do a <- before -- Big problem if exception happens here -- r <- restore (thing a) `onException` after a _ <- after a return r But when bind sites are the only place an exception can be thrown, things get a lot simpler. For instance, it is perfectly reasonable to write: bracket before after thing = do a <- before thing a `finallyI` after a David

At Wed, 11 May 2011 13:02:21 +0100, Simon Marlow wrote:
However, if there's some simpler way to guarantee that>>= is the point where exceptions are thrown (and might be the case for GHC in practice), then I basically only need to update the docs. If someone with more GHC understanding could explain how asynchronous exceptions work, I'd love to hear it...
There's no guarantee of the form that you mention - asynchronous exceptions can occur anywhere. However, there might be a way to do what you want (disclaimer: I haven't looked at the implementation of iterIO).
Control.Exception will have a new operation in 7.2.1:
allowInterrupt :: IO () allowInterrupt = unsafeUnmask $ return ()
which allows an asynchronous exception to be thrown inside mask (until 7.2.1 you can define it yourself, unsafeUnmask comes from GHC.IO).
So to answer my own question from earlier, I did a bit of benchmarking, and it seems that on my machine (a 2.4 GHz Intel Xeon 3060, running linux 2.6.38), I get the following costs: 9 ns - return () :: IO () -- baseline (meaningless in itself) 13 ns - unsafeUnmask $ return () -- with interrupts enabled 18 ns - unsafeUnmask $ return () -- inside a mask_ 13 ns - ffi -- a null FFI call (getpid cached by libc) 18 ns - unsafeUnmask ffi -- with interrupts enabled 22 ns - unsafeUnmask ffi -- inside a mask_ 131 ns - syscall -- getppid through FFI 135 ns - unsafeUnmask syscall -- with interrupts enabled 140 ns - unsafeUnmask syscall -- inside a mask_ So it seems that the cost of calling unsafeUnmask inside every liftIO would be about 22 cycles per liftIO invocation, which seems eminently reasonable. You could then safely run your whole program inside a big mask_ and not worry about exceptions happening between >>= invocations. Though truly compute-intensive workloads could have issues, the kind of applications targeted by iterIO will spend most of their time doing I/O, so this shouldn't be an issue. Better yet, for programs that don't use asynchronous exceptions, if you don't put your whole program inside a mask_, the cost drops roughly in half. It's hard to imagine any real application whose performance would take a significant hit because of an extra 11 cycles per liftIO. Is there anything I'm missing? For instance, my machine only has one CPU, and the tests all ran with one thread. Does unmaskAsyncExceptions# acquire a spinlock that could lock the memory bus? Or is there some other reason unsafeUnmask could become expensive on NUMA machines, or in the presence of concurrency? Thanks, David

On 11/05/2011 23:57, dm-list-haskell-cafe@scs.stanford.edu wrote:
At Wed, 11 May 2011 13:02:21 +0100, Simon Marlow wrote:
However, if there's some simpler way to guarantee that>>= is the point where exceptions are thrown (and might be the case for GHC in practice), then I basically only need to update the docs. If someone with more GHC understanding could explain how asynchronous exceptions work, I'd love to hear it...
There's no guarantee of the form that you mention - asynchronous exceptions can occur anywhere. However, there might be a way to do what you want (disclaimer: I haven't looked at the implementation of iterIO).
Control.Exception will have a new operation in 7.2.1:
allowInterrupt :: IO () allowInterrupt = unsafeUnmask $ return ()
which allows an asynchronous exception to be thrown inside mask (until 7.2.1 you can define it yourself, unsafeUnmask comes from GHC.IO).
So to answer my own question from earlier, I did a bit of benchmarking, and it seems that on my machine (a 2.4 GHz Intel Xeon 3060, running linux 2.6.38), I get the following costs:
9 ns - return () :: IO () -- baseline (meaningless in itself) 13 ns - unsafeUnmask $ return () -- with interrupts enabled 18 ns - unsafeUnmask $ return () -- inside a mask_
13 ns - ffi -- a null FFI call (getpid cached by libc) 18 ns - unsafeUnmask ffi -- with interrupts enabled 22 ns - unsafeUnmask ffi -- inside a mask_
Those are lower than I was expecting, but look plausible. There's room for improvement too (by inlining some or all of unsafeUnmask#). However, the general case of unsafeUnmask E, where E is something more complex than return (), will be more expensive because a new closure for E has to be created. e.g. try "return x" instead of "return ()", and try to make sure that the closure has to be created once per unsafeUnmask, not lifted out and shared.
131 ns - syscall -- getppid through FFI 135 ns - unsafeUnmask syscall -- with interrupts enabled 140 ns - unsafeUnmask syscall -- inside a mask_
So it seems that the cost of calling unsafeUnmask inside every liftIO would be about 22 cycles per liftIO invocation, which seems eminently reasonable. You could then safely run your whole program inside a big mask_ and not worry about exceptions happening between>>= invocations. Though truly compute-intensive workloads could have issues, the kind of applications targeted by iterIO will spend most of their time doing I/O, so this shouldn't be an issue.
Better yet, for programs that don't use asynchronous exceptions, if you don't put your whole program inside a mask_, the cost drops roughly in half. It's hard to imagine any real application whose performance would take a significant hit because of an extra 11 cycles per liftIO.
Is there anything I'm missing? For instance, my machine only has one CPU, and the tests all ran with one thread. Does unmaskAsyncExceptions# acquire a spinlock that could lock the memory bus? Or is there some other reason unsafeUnmask could become expensive on NUMA machines, or in the presence of concurrency?
There are no locks here, thanks to the message-passing implementation we use for throwTo between processors. unmaskeAsyncExceptions# basically pushes a small stack frame, twiddles a couple of bits in the thread state, and checks a word in the thread state to see whether any exceptions are pending. The stack frame untwiddles the bits again and returns. Cheers, Simon

2011/5/6 David Mazieres
* Every aspect of the library is thoroughly document in haddock including numerous examples of use.
I'm reading the documentation, it's impressively well detailed. It has explanations, examples, all that one could dream for. Thanks !

David Mazieres
Hi, everyone. I'm pleased to announce the release of a new iteratee implementation, iterIO:
http://hackage.haskell.org/package/iterIO
IterIO is an attempt to make iteratees easier to use through an interface based on pipeline stages reminiscent of Unix command pipelines. Particularly if you've looked at iteratees before and been intimidated, please have a look at iterIO to see if it makes them more accessible.
[...]
Please enjoy. I'd love to hear feedback.
Thanks a lot, David. This looks like really good work. I'm using the 'enumerator' package, and looking at the types your library seems to use a similar, but more complicated representation. Is there any particular reason, why you didn't base your library on an existing iteratee package like 'enumerator'? Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

2011/5/6 Ertugrul Soeylemez
David Mazieres
wrote:
Please enjoy. I'd love to hear feedback.
Thanks a lot, David. This looks like really good work. I'm using the 'enumerator' package, and looking at the types your library seems to use a similar, but more complicated representation. Is there any particular reason, why you didn't base your library on an existing iteratee package like 'enumerator'?
David has documented some design decisions in http://hackage.haskell.org/packages/archive/iterIO/0.1/doc/html/Data-IterIO.... Perhaps you may find some answers there. David (another one :) )

On Fri, May 6, 2011 at 6:46 AM, David Virebayre
2011/5/6 Ertugrul Soeylemez
: David Mazieres
wrote: Please enjoy. I'd love to hear feedback.
Thanks a lot, David. This looks like really good work. I'm using the 'enumerator' package, and looking at the types your library seems to use a similar, but more complicated representation. Is there any particular reason, why you didn't base your library on an existing iteratee package like 'enumerator'?
David has documented some design decisions in http://hackage.haskell.org/packages/archive/iterIO/0.1/doc/html/Data-IterIO....
Perhaps you may find some answers there.
He says that enumerator's Iteratee doesn't have special support for pure Iteratees. When he says that the iteratee package doesn't have special support for control messages, the same applies for enumerator as well. He also says that enumerator can't distinguish failures from iteratees and enumeratees. He also says that the enumerator package's Enumerators aren't iteratees, only iterIO's enumerators are. Well, that's not what I'm reading: -- from enumerator package newtype Iteratee a m b = Iteratee {runIteratee :: m (Step a m b)} type Enumerator a m b = Step a m b -> Iteratee a m b type Enumeratee ao ai m b = Step ai m b -> Iteratee ao m (Step ai m b) -- from iterIO package newtype Iter t m a = Iter {runIter :: Chunk t -> IterR t m a} type Inum tIn tOut m a = Iter tOut m a -> Iter tIn m (IterR tOut m a) type Onum t m a = Inum () t m a The enumerator package's Enumerator *is* an iteratee, an so is its Enumeratee. The only real difference is that iterIO represents enumerators as enumeratees from () to something. In enumerator package terms, that would be -- enumerator packages's enumerator if it was iterIO's :) -- note that Inum's "tIn" and "tOut" are reversed w.r.t Enumeratee "ao" and "ai" type Enumerator a m b = Enumeratee () a m b Whether this representation is better or worse isn't clear for me. Now, one big problem that iterIO has that enumerator hasn't, is that iterIO is a *big* library with many dependencies, including OpenSSL. IMHO, that package should be split into many others. So, in the enumerator vs. iterIO challenge, the only big differences I see are: a) iterIO has a different exception handling mechanism. b) iterIO can have pure iteratees that don't touch the monad. c) iterIO's iteratees can send control messages to ther enumerators. d) iterIO's enumerators are enumeratees, but enumerator's enumerators are simpler. e) enumerator has fewer dependencies. f) enumerator uses conventional nomenclature. g) enumerator is Haskell 98, while iterIO needs many extensions (e.g. MPTC and functional dependencies). Anything that I missed? The bottomline: the biggest advantage I see right now in favor of iterIO is c), although it still has the problem that you may get runtime errors if you send the wrong control message. However, right now e) and g) may stop many users of enumerator from porting to iterIO, even if they like its approach. Cheers! =) -- Felipe.

On Fri, 06 May 2011 15:10:26 +0200, Felipe Almeida Lessa
So, in the enumerator vs. iterIO challenge, the only big differences I see are:
a) iterIO has a different exception handling mechanism. b) iterIO can have pure iteratees that don't touch the monad. c) iterIO's iteratees can send control messages to ther enumerators. d) iterIO's enumerators are enumeratees, but enumerator's enumerators are simpler. e) enumerator has fewer dependencies. f) enumerator uses conventional nomenclature. g) enumerator is Haskell 98, while iterIO needs many extensions (e.g. MPTC and functional dependencies).
Anything that I missed?
iterIO cannot be compiled on Windows, because it depends on the package unix. Regards, Henk-Jan van Tuyl -- http://Van.Tuyl.eu/ http://members.chello.nl/hjgtuyl/tourdemonad.html --

On Fri, May 6, 2011 at 10:44 AM, Henk-Jan van Tuyl
iterIO cannot be compiled on Windows, because it depends on the package unix.
That's a big showstopper. I wonder if the package split I recommend could solve this issue, or if it's something deeper. Cheers, -- Felipe.

At Fri, 6 May 2011 10:54:16 -0300, Felipe Almeida Lessa wrote:
On Fri, May 6, 2011 at 10:44 AM, Henk-Jan van Tuyl
wrote: iterIO cannot be compiled on Windows, because it depends on the package unix.
That's a big showstopper. I wonder if the package split I recommend could solve this issue, or if it's something deeper.
It's actually worse than this, unfortunately. The unix package dependency is mostly there for efficiency. For the HTTP package, in order to handle things like directories, If-Modified-Since, and Content-Length, I need to look at file attributes. The platform-independent code lets me do this, but I would have to make many more system calls. Also, I would have a slight race condition, because it's hard to get the attributes of the file you actually opened (to make sure the length hasn't changed, etc), while the unix package gets me access to both stat and fstat. This has all been abstracted away by the FileSystemCalls class, so if there's a way to implement those five functions on Windows, we could move defaultFileSystemCalls to its own module (or even its own package), and solve the problem without sacrificing performance or correctness on unix. Unfortunately, there are two worse unix dependencies: 1) I'm using the network IO package to do IO on ByteStrings, and the network library claims this doesn't work on windows. 2) Proper implementation of many network protocols requires the ability to send a TCP FIN segment without closing the underlying file descriptor (so you can still read from it). Thus, I'm using FFI to call the shutdown() system call on the file descriptors of Handles. I have no idea how to make this work on Windows. I'm hoping that time eventually solves problem #1. As for problem #2, the ideal solution would be to get something like hShutdown into the system libraries. I'd obviously love to make my stuff work on Windows, but probably lack the experience to do it on my own. Suggestions and help are of course welcome... David

On Fri, 06 May 2011 18:28:07 +0200,
At Fri, 6 May 2011 10:54:16 -0300, Felipe Almeida Lessa wrote:
iterIO cannot be compiled on Windows, because it depends on the
On Fri, May 6, 2011 at 10:44 AM, Henk-Jan van Tuyl
wrote: package unix.
That's a big showstopper. I wonder if the package split I recommend could solve this issue, or if it's something deeper.
[...]
I'd obviously love to make my stuff work on Windows, but probably lack the experience to do it on my own. Suggestions and help are of course welcome...
Is the unix-compat package any good? Regards, Henk-Jan van Tuyl -- http://Van.Tuyl.eu/ http://members.chello.nl/hjgtuyl/tourdemonad.html --

At Sat, 07 May 2011 00:09:46 +0200, Henk-Jan van Tuyl wrote:
iterIO cannot be compiled on Windows, because it depends on the
unix. [...] I'd obviously love to make my stuff work on Windows, but probably lack
On Fri, May 6, 2011 at 10:44 AM, Henk-Jan van Tuyl
wrote: package the experience to do it on my own. Suggestions and help are of course welcome... Is the unix-compat package any good?
Thanks for the suggestion. I'm not sure I totally understand how to use unix-compat, though. It gives me calls like getFdStatus :: Fd -> IO FileStatus which is one of the things I need. But how do I get an Fd in the first place? (unix-compat seems to have no equivalent of openFd.) David

At Fri, 6 May 2011 10:10:26 -0300, Felipe Almeida Lessa wrote:
He also says that the enumerator package's Enumerators aren't iteratees, only iterIO's enumerators are. Well, that's not what I'm reading:
-- from enumerator package newtype Iteratee a m b = Iteratee {runIteratee :: m (Step a m b)} type Enumerator a m b = Step a m b -> Iteratee a m b type Enumeratee ao ai m b = Step ai m b -> Iteratee ao m (Step ai m b)
-- from iterIO package newtype Iter t m a = Iter {runIter :: Chunk t -> IterR t m a} type Inum tIn tOut m a = Iter tOut m a -> Iter tIn m (IterR tOut m a) type Onum t m a = Inum () t m a
The enumerator package's Enumerator *is* an iteratee, an so is its Enumeratee.
Strictly speaking, I guess that's precise if you look at the type of Enumerator. However, it's not really an iteratee in the spirit of iteratees, since it isn't really a data sink and has no input type.
The only real difference is that iterIO represents enumerators as enumeratees from () to something. In enumerator package terms, that would be
-- enumerator packages's enumerator if it was iterIO's :) -- note that Inum's "tIn" and "tOut" are reversed w.r.t Enumeratee "ao" and "ai" type Enumerator a m b = Enumeratee () a m b
Whether this representation is better or worse isn't clear for me.
Exactly. The reason it's better (and for a long time my library was more like the enumerator one) is that the mechanics of uniform error handling are complex enough as it is. When enumerators and enumeratees are two different types, you need two different mechanisms for constructing them, and then have to worry about handing errors in the two different cases. I found that unifying enumerators and enumeratees (or Inums and Onums as I call them) significantly simplified a lot of code.
Now, one big problem that iterIO has that enumerator hasn't, is that iterIO is a *big* library with many dependencies, including OpenSSL. IMHO, that package should be split into many others.
Yes, this is definitely true.
So, in the enumerator vs. iterIO challenge, the only big differences I see are:
a) iterIO has a different exception handling mechanism. b) iterIO can have pure iteratees that don't touch the monad. c) iterIO's iteratees can send control messages to ther enumerators. d) iterIO's enumerators are enumeratees, but enumerator's enumerators are simpler. e) enumerator has fewer dependencies. f) enumerator uses conventional nomenclature. g) enumerator is Haskell 98, while iterIO needs many extensions (e.g. MPTC and functional dependencies).
Anything that I missed?
The bottomline: the biggest advantage I see right now in favor of iterIO is c),
I basically agree with this list, but think you are underestimating the value of a. I would rank a as the most important difference between the packages. (a also is the reason for d.) David

On Thu, 2011-05-05 at 21:15 -0700, David Mazieres wrote:
Hi, everyone. I'm pleased to announce the release of a new iteratee implementation, iterIO:
http://hackage.haskell.org/package/iterIO
IterIO is an attempt to make iteratees easier to use through an interface based on pipeline stages reminiscent of Unix command pipelines. Particularly if you've looked at iteratees before and been intimidated, please have a look at iterIO to see if it makes them more accessible.
Some aspects of iterIO that should simplify learning and using iteratees are:
* Every aspect of the library is thoroughly document in haddock including numerous examples of use.
* Enumerators are easy to build out of iteratees.
* There is no difference between enumerators and "enumeratees" (i.e., inner pipeline stages). The former is just a type-restricted version of the latter.
* Parsing combinators provide detailed error reporting and support LL(*) rather than LL(1) parsing, leading to fewer non-intuitive parsing failures. A couple of tricks avoid consuming excessive memory for backtracking.
* Super-fast LL(1) parsing is also available through seamless integration with attoparsec.
* A universal exception mechanism works across invocations of mtl monad transformers, thereby unifying error handling.
* All pipe operators have uniform semantics, eliminating corner cases. In particular, if the writing end of a pipe fails, the reading end always gets EOF, allowing it to clean up resources.
* One can catch exceptions thrown by any contiguous subset of stages in a pipeline. Moreover, enumerator exception handlers can resume downstream stages that haven't failed.
* The package is full of useful iteratees and enumerators, including basic file and socket processing, parsec-like combinators, string search, zlib/gzip compression, SSL, HTTP, and "loopback" enumerator/iteratee pairs for testing a protocol implementation against itself.
Please enjoy. I'd love to hear feedback.
David
1. It looks nice - however it causes problem as we have 3 iteratees packages, all of which have some advantages. 4 if we count coroutine. (I don't count original implementations). 2. What is the reason of using Inum/Onum instead of Iteratee/Enumerator/Enumeratee. The latter seems to be a standard naming in the community? Regards

Hi All, I really love the look of this package, but if this is going be *the* iteratee package, I would absolutely love to see it fix some of the biggest mistakes in the other iteratee packages, soecifically naming. A change in naming for the terms iteratee, enumerator and enumeratee would go a hell of a long way here; Peaker on #haskell suggested Consumer/Producer/Transformer, and there is a lot of agreement in the channel that these are vastly better names. They’re also far less intimidating to users. I personally feel that maybe Transformer isn't such a great name (being closely associated with monad transformers), and that maybe something like Mapper would be better, but I'm by no means in love with that name either. More people in #haskell seem to like Transformer, and I don't think my argument against it is very strong, so the hivemind seems to have settled on the Producer/Transformer/Consumer trilogy. I'd love to hear thoughts on the issue, especially from David. Cheers, Alex Mason On 06/05/2011, at 20:17, Maciej Marcin Piechotka wrote:
On Thu, 2011-05-05 at 21:15 -0700, David Mazieres wrote:
Hi, everyone. I'm pleased to announce the release of a new iteratee implementation, iterIO:
http://hackage.haskell.org/package/iterIO
IterIO is an attempt to make iteratees easier to use through an interface based on pipeline stages reminiscent of Unix command pipelines. Particularly if you've looked at iteratees before and been intimidated, please have a look at iterIO to see if it makes them more accessible.
Some aspects of iterIO that should simplify learning and using iteratees are:
* Every aspect of the library is thoroughly document in haddock including numerous examples of use.
* Enumerators are easy to build out of iteratees.
* There is no difference between enumerators and "enumeratees" (i.e., inner pipeline stages). The former is just a type-restricted version of the latter.
* Parsing combinators provide detailed error reporting and support LL(*) rather than LL(1) parsing, leading to fewer non-intuitive parsing failures. A couple of tricks avoid consuming excessive memory for backtracking.
* Super-fast LL(1) parsing is also available through seamless integration with attoparsec.
* A universal exception mechanism works across invocations of mtl monad transformers, thereby unifying error handling.
* All pipe operators have uniform semantics, eliminating corner cases. In particular, if the writing end of a pipe fails, the reading end always gets EOF, allowing it to clean up resources.
* One can catch exceptions thrown by any contiguous subset of stages in a pipeline. Moreover, enumerator exception handlers can resume downstream stages that haven't failed.
* The package is full of useful iteratees and enumerators, including basic file and socket processing, parsec-like combinators, string search, zlib/gzip compression, SSL, HTTP, and "loopback" enumerator/iteratee pairs for testing a protocol implementation against itself.
Please enjoy. I'd love to hear feedback.
David
1. It looks nice - however it causes problem as we have 3 iteratees packages, all of which have some advantages. 4 if we count coroutine. (I don't count original implementations).
2. What is the reason of using Inum/Onum instead of Iteratee/Enumerator/Enumeratee. The latter seems to be a standard naming in the community?
Regards _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Alex Mason wrote:
I really love the look of this package, but if this is going be *the* iteratee package, I would absolutely love to see it fix some of the biggest mistakes in the other iteratee packages, soecifically naming. A change in naming for the terms iteratee, enumerator and enumeratee would go a hell of a long way here; Peaker on #haskell suggested Consumer/Producer/Transformer, and there is a lot of agreement in the channel that these are vastly better names. They’re also far less intimidating to users.
I personally feel that maybe Transformer isn't such a great name (being closely associated with monad transformers), and that maybe something like Mapper would be better, but I'm by no means in love with that name either. More people in #haskell seem to like Transformer, and I don't think my argument against it is very strong, so the hivemind seems to have settled on the Producer/Transformer/Consumer trilogy.
I'd love to hear thoughts on the issue, especially from David.
I vastly prefer the names Producer/Transformer/Consumer over the others. Then again, I never quite understood what Iteratees were all about in the first place. Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

At Sat, 7 May 2011 01:15:25 +1000, Alex Mason wrote:
Hi All,
I really love the look of this package, but if this is going be *the* iteratee package, I would absolutely love to see it fix some of the biggest mistakes in the other iteratee packages, soecifically naming. A change in naming for the terms iteratee, enumerator and enumeratee would go a hell of a long way here; Peaker on #haskell suggested Consumer/Producer/Transformer, and there is a lot of agreement in the channel that these are vastly better names. They’re also far less intimidating to users.
I personally feel that maybe Transformer isn't such a great name (being closely associated with monad transformers), and that maybe something like Mapper would be better, but I'm by no means in love with that name either. More people in #haskell seem to like Transformer, and I don't think my argument against it is very strong, so the hivemind seems to have settled on the Producer/Transformer/Consumer trilogy.
I'd love to hear thoughts on the issue, especially from David.
This is a question I struggled a lot with. I definitely agree that the terms are pretty intimidating to new users. At least one thing I've concluded is that it really should be presented as two concepts, rather than three. So we should talk about, e.g., producers, consumers, and pipeline stages that do both. I'd been thinking about using the terms Source and Sink, but Source is very overloaded, and "SinkSource" doesn't exactly roll off the tongue or evoke a particularly helpful intuition. In the end, I decided just to come up with new terms that wouldn't carry any pre-conceptions (e.g., what's an "Inum"?), and then build the intuition through copious documentation... I'm open to suggestion here. I've already overhauled the naming conventions in the library once. Initially I used the names EnumI and EnumO for Inum and Onum. I think the old names were much worse, especially since Enum is a fundamental typeclass that has absolutely nothing to do with enumerators. David

At least one thing I've concluded is that it really should be presented as two concepts, rather than three. So we should talk about, e.g., producers, consumers, and pipeline stages that do both.
I think that's a great idea. I'd been thinking about using the terms Source and Sink, but Source is
very overloaded, and "SinkSource" doesn't exactly roll off the tongue or evoke a particularly helpful intuition.
One good thing I can say for the Enumerator/Iteratee nomenclature is that it
nicely connotes the inversion of control (i.e., the "push" data flow) that
enumerator is all about. Enumera*tor** *feeds Itera*tee* -- subject, verb,
object. Producer/Consumer connotes the same by allusion to the
producer-consumer pattern of thread synchronization.
Tom
On Fri, May 6, 2011 at 9:47 AM,
This is a question I struggled a lot with. I definitely agree that the terms are pretty intimidating to new users.
At least one thing I've concluded is that it really should be presented as two concepts, rather than three. So we should talk about, e.g., producers, consumers, and pipeline stages that do both.
I'd been thinking about using the terms Source and Sink, but Source is very overloaded, and "SinkSource" doesn't exactly roll off the tongue or evoke a particularly helpful intuition.
In the end, I decided just to come up with new terms that wouldn't carry any pre-conceptions (e.g., what's an "Inum"?), and then build the intuition through copious documentation...
I'm open to suggestion here. I've already overhauled the naming conventions in the library once. Initially I used the names EnumI and EnumO for Inum and Onum. I think the old names were much worse, especially since Enum is a fundamental typeclass that has absolutely nothing to do with enumerators.
David
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 11-05-06 11:15 AM, Alex Mason wrote:
Hi All,
I really love the look of this package, but if this is going be *the* iteratee package, I would absolutely love to see it fix some of the biggest mistakes in the other iteratee packages, soecifically naming. A change in naming for the terms iteratee, enumerator and enumeratee would go a hell of a long way here; Peaker on #haskell suggested Consumer/Producer/Transformer, and there is a lot of agreement in the channel that these are vastly better names. They’re also far less intimidating to users.
I personally feel that maybe Transformer isn't such a great name (being closely associated with monad transformers), and that maybe something like Mapper would be better, but I'm by no means in love with that name either. More people in #haskell seem to like Transformer, and I don't think my argument against it is very strong, so the hivemind seems to have settled on the Producer/Transformer/Consumer trilogy.
I'd love to hear thoughts on the issue, especially from David.
The Producer/Consumer terminology, if I'm not mistaken, is usually applied to coroutine pairs. I use these terms myself in the SCC package, together with terms Transducer and Splitter. The former term is also well established, the latter was my own. Though I like and use this terminology, I'm not sure it's a good fit for the existing Enumerator/Iteratee pairs, which are not real symmetric coroutines. Enumerators are more like the Python (<2.5) Generators. I don't know what the Python terminology would be for the Iteratee. On 11-05-06 12:47 PM, dm-list-haskell-cafe@scs.stanford.edu wrote:
This is a question I struggled a lot with. I definitely agree that the terms are pretty intimidating to new users.
At least one thing I've concluded is that it really should be presented as two concepts, rather than three. So we should talk about, e.g., producers, consumers, and pipeline stages that do both.
I'd been thinking about using the terms Source and Sink, but Source is very overloaded, and "SinkSource" doesn't exactly roll off the tongue or evoke a particularly helpful intuition.
The SCC package happens to use Source and Sink names as well. They are used not for coroutines directly, but instead for references to coroutines of the appropriate type. Every consumer thus owns a Source from which it fetches its input, and that Source is always bound to another coroutine that yields those values through a Sink. Source and Sink are a passive handle to a Producer and Consumer. I may be subjective, but I find this use of the terms very fitting.

At Fri, 06 May 2011 21:27:21 -0400, Mario Blažević wrote:
I'd been thinking about using the terms Source and Sink, but Source is very overloaded, and "SinkSource" doesn't exactly roll off the tongue or evoke a particularly helpful intuition.
The SCC package happens to use Source and Sink names as well. They are used not for coroutines directly, but instead for references to coroutines of the appropriate type. Every consumer thus owns a Source from which it fetches its input, and that Source is always bound to another coroutine that yields those values through a Sink. Source and Sink are a passive handle to a Producer and Consumer. I may be subjective, but I find this use of the terms very fitting.
You mean fitting for references to coroutines, or fitting for the replacement names for Enumerator/Iteratee? If there's overwhelming consensus, I would certainly consider changing the names in the iterIO library, but it's a pretty big change... David

On 11-05-06 09:58 PM, dm-list-haskell-cafe@scs.stanford.edu wrote:
At Fri, 06 May 2011 21:27:21 -0400, Mario Blažević wrote:
I'd been thinking about using the terms Source and Sink, but Source is very overloaded, and "SinkSource" doesn't exactly roll off the tongue or evoke a particularly helpful intuition. The SCC package happens to use Source and Sink names as well. They are used not for coroutines directly, but instead for references to coroutines of the appropriate type. Every consumer thus owns a Source from which it fetches its input, and that Source is always bound to another coroutine that yields those values through a Sink. Source and Sink are a passive handle to a Producer and Consumer. I may be subjective, but I find this use of the terms very fitting. You mean fitting for references to coroutines, or fitting for the replacement names for Enumerator/Iteratee?
The former, unfortunately. As I said, the most usual name for the Enumerator concept would be Generator. That term is already used in several languages to signify this kind of restricted coroutine. I'm not aware of any good alternative naming for Iteratee.

On 7/05/2011, at 2:44 PM, Mario Blažević wrote:
As I said, the most usual name for the Enumerator concept would be Generator. That term is already used in several languages to signify this kind of restricted coroutine. I'm not aware of any good alternative naming for Iteratee.
This being Haskell, I'm expecting to see Cogenerator (:-) (:-).

On 5/6/11 11:15 AM, Alex Mason wrote:
Hi All,
I really love the look of this package, but if this is going be *the* iteratee package, I would absolutely love to see it fix some of the biggest mistakes in the other iteratee packages, soecifically naming. A change in naming for the terms iteratee, enumerator and enumeratee would go a hell of a long way here; Peaker on #haskell suggested Consumer/Producer/Transformer, and there is a lot of agreement in the channel that these are vastly better names. They’re also far less intimidating to users.
I personally feel that maybe Transformer isn't such a great name (being closely associated with monad transformers), and that maybe something like Mapper would be better, but I'm by no means in love with that name either. More people in #haskell seem to like Transformer, and I don't think my argument against it is very strong, so the hivemind seems to have settled on the Producer/Transformer/Consumer trilogy.
I believe "transducer" is the proper term. (Of course, producers and consumers are both special cases of transducers, trivializing the input or output stream, respectively.) Though, IMO, I don't find the names "producer" and "consumer" enlightening as to why this particular pattern of iteration/enumeration is different from the conventional pattern found in OOP's iterators. Any time you have a bunch of things being created and passed around you have producers and consumers; the terminology is insufficient to define the pattern. The shift from "iterator" to "enumerator" helps to capture that difference. Given as there's no common name for the code calling an iterator, it's not immediately apparent what the push-based enumerative version should be called; "iteratee" seems as good as any other name, because it expresses the duality involved in the switch from the iterative style. control : pull, push producer : iterator, enumerator consumer : ???, iteratee Of course, this pattern of names suggests that "enumeratee" should properly be a backformation for naming the consumer of the pull-based iterative pattern. But then we're still left with the problem of what the transducers should be called in both cases. -- Live well, ~wren

Sorry for second-posting. In addition to the problems mentioned elsewhere (too big packages) I would like to point problems with SSL: - It uses OpenSSL from what I understand which is not compatible with GPL-2 as it uses Apache 1.0 licence (in addition to BSD4) as it requires mentioning OpenSSL ("This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit"). - It doesn't allow to use it after STARTTLS command Regards

Sorry for third post but I wonder why the many instances are restricted by Monad. Both Functor and Applicative can by constructed without Monad:
instance (Functor m) => Functor (CtlArg t m) where fmap f (CtlArg arg g c) = CtlArg arg (fmap f . g) c
instance (Functor m) => Functor (Iter t m) where {-# INLINE fmap #-} fmap f (Iter g) = Iter (fmap f . g
instance (Functor m) => Functor (IterR t m) where fmap f (IterF i) = IterF (fmap f i) fmap f (IterM i) = IterM (fmap (fmap f) i) fmap f (IterC c) = IterC (fmap f c) fmap f (Done a c) = Done (f a) c fmap f (Fail i m mc) = Fail i (fmap f m) mc
instance (Functor m) => Applicative (Iter t m) where {-# INLINE pure #-} pure x = Iter $ Done x {-# INLINE (<*>) #-} Iter a <*> bi@(Iter b) = Iter $ \c -> fix (\f ir -> case ir of IterF cont -> cont <*> bi IterM m -> IterM $ fmap f m IterC (CtlArg a cn ch) -> IterC (CtlArg a (\r -> cn r <*> bi) ch) Done v ch -> fmap v (b ch) Fail f _ ch -> Fail f Nothing ch) a c
Since every monad is applicative (or rather should be) it doesn't loose generality. Join is also defined by using only functor:
joinI :: (Functor m) => Iter t m (Iter t m a) -> Iter t m a joinI (Iter i) = Iter $ \c -> fix (\f x -> case x of IterF cont -> IterF (joinI cont) IterM m -> IterM $ fmap f m IterC (CtlArg a cn ch) -> IterC (CtlArg a (\r -> joinI (cn r)) ch) Done v ch -> runIter v ch Fail f _ ch -> Fail f Nothing ch) (i c)
Regards PS. I haven't tested the code or benchmarked it - but it seems it is possible.

At Sat, 07 May 2011 21:50:13 +0100, Maciej Marcin Piechotka wrote:
Sorry for third post but I wonder why the many instances are restricted by Monad.
It would be great if Functor were a superclass of Monad. However, since it isn't, and since I can't think of anything particularly useful to do with Iters over Functors that aren't also Monads, I'd rather just pass one dictionary around than two. So my convention throughout the library is that m has to be a Monad but doesn't have to be a Functor. In general, I try to place as few requirements in the contexts of functions as possible. However, I also want to be able to call most functions from most other ones. If some of the useful low-level functions end up requiring Functor, then most functions in the library are going to end up requiring (Functor m, Monad m) => instead of (Monad m) =>, which will actually end up increasing the amount of stuff in contexts. (Of course, (Iter t m) itself is an Applicative Functor, even when m is just a Monad. So that I make use of in the parsing module.) David

On 5/7/11 5:15 PM, dm-list-haskell-cafe@scs.stanford.edu wrote:
In general, I try to place as few requirements in the contexts of functions as possible.
One counterargument to this philosophy is that there are many cases where fmap can be defined more efficiently than the liftM derived from return and (>>=). Similarly, the applicative operators (<*) and (*>) often admit more efficient implementations than the default. So, when dealing with monads that have those more efficient definitions, you're restricting performance unnecessarily by forcing them to use the generic monadic definitions. There's nothing wrong with having multiple constraints in the context. -- Live well, ~wren
participants (16)
-
Alex Mason
-
David Mazieres
-
David Virebayre
-
dm-list-haskell-cafe@scs.stanford.edu
-
Ertugrul Soeylemez
-
Eugene Kirpichov
-
Felipe Almeida Lessa
-
Gregory Collins
-
Heinrich Apfelmus
-
Henk-Jan van Tuyl
-
Maciej Marcin Piechotka
-
Mario Blažević
-
Richard O'Keefe
-
Simon Marlow
-
Tom Brow
-
wren ng thornton