ANNOUNCE: enumerator 0.4.8

John Millikin

26 Mar 2011 26 Mar '11

5:39 a.m.

----------------------------------------------------------------------------------------------------------------------------------------------------- Enumerators are an efficient, predictable, and safe alternative to lazy I/O. Discovered by Oleg Kiselyov, they allow large datasets to be processed in near–constant space by pure code. Although somewhat more complex to write, using enumerators instead of lazy I/O produces more correct programs. http://hackage.haskell.org/package/enumerator http://john-millikin.com/software/enumerator/ ----------------------------------------------------------------------------------------------------------------------------------------------------- Hello -cafe, It's been a while since the last point release of enumerator. This one is sufficiently large that I think folks might want to know about it, and since I try not to spam too many announcements, I'll give a quick rundown on major changes in other 0.4.x versions as well. First, most of what I call "list analogues" -- enumerator-based versions of 'head', 'take', 'map', etc -- have been separated into three modules (Data.Enumerator.List, .Binary, and .Text) depending on what sorts of data they operate on. This separation has been an ongoing process throughout 0.4.x releases, and I think it's now complete. The old names in Data.Enumerator will continue to exist in 0.4.x versions, but will be removed in 0.5. Second, Gregory Collins and Ertugrul Soeylemez found a space leak in Iteratee's (>>=), which could cause eventual space exhaustion in some circumstances. If you use enumerators to process very large or infinite streams, you probably want to upgrade to version 0.4.7 or higher. Third, the source code PDF has seen some substantial improvement -- if you're interested in how the library is implemented, or have insomnia, read it at < http://john-millikin.com/software/enumerator/enumerator_0.4.8.pdf

...

Finally, there is a known issue in the current encoding of iteratees -- if an iteratee yields extra data but never consumed anything, that iteratee will violate the monad law of associativity. Oleg has updated his implementations to fix this problem, but since it would break a *lot* of dependent libraries, I'm holding off until the vague future of version 0.5. Since iteratees that yield extra data they didn't consume are invalid anyway, I hope this problem will not cause too much inconvenience. New features ----------------- * Range-limited binary file enumeration (requested + initial patch by Bardur Arantsson). * splitWhen , based on the "split" package < http://hackage.haskell.org/package/split > * 0.4.6: Typeable instances for most types (requested by Michael Snoyman) * 0.4.5: joinE , which simplifies enumerator/enumeratee composition (requested by Michael Snoyman)

Show replies by date

Michael Snoyman

26 Mar 26 Mar

5:46 p.m.

Great work as usual John. I'm actually very happy to see enumHandleRange: the next version of WAI will support partial files, and I just implemented my own version of enumHandleRange over there. I will gladly switch to your (most likely more correct) version. As far as the left-over data in a yield issue: does that require a breaking API change, or a change to the definition of >>= which would change semantics?? Michael On Sat, Mar 26, 2011 at 7:39 AM, John Millikin wrote:

...

----------------------------------------------------------------------------------------------------------------------------------------------------- Enumerators are an efficient, predictable, and safe alternative to lazy I/O. Discovered by Oleg Kiselyov, they allow large datasets to be processed in near–constant space by pure code. Although somewhat more complex to write, using enumerators instead of lazy I/O produces more correct programs.

http://hackage.haskell.org/package/enumerator http://john-millikin.com/software/enumerator/ -----------------------------------------------------------------------------------------------------------------------------------------------------

Hello -cafe,

It's been a while since the last point release of enumerator. This one is sufficiently large that I think folks might want to know about it, and since I try not to spam too many announcements, I'll give a quick rundown on major changes in other 0.4.x versions as well.

First, most of what I call "list analogues" -- enumerator-based versions of 'head', 'take', 'map', etc -- have been separated into three modules (Data.Enumerator.List, .Binary, and .Text) depending on what sorts of data they operate on. This separation has been an ongoing process throughout 0.4.x releases, and I think it's now complete. The old names in Data.Enumerator will continue to exist in 0.4.x versions, but will be removed in 0.5.

Second, Gregory Collins and Ertugrul Soeylemez found a space leak in Iteratee's (>>=), which could cause eventual space exhaustion in some circumstances. If you use enumerators to process very large or infinite streams, you probably want to upgrade to version 0.4.7 or higher.

Third, the source code PDF has seen some substantial improvement -- if you're interested in how the library is implemented, or have insomnia, read it at < http://john-millikin.com/software/enumerator/enumerator_0.4.8.pdf

...
Finally, there is a known issue in the current encoding of iteratees -- if an iteratee yields extra data but never consumed anything, that iteratee will violate the monad law of associativity. Oleg has updated his implementations to fix this problem, but since it would break a *lot* of dependent libraries, I'm holding off until the vague future of version 0.5. Since iteratees that yield extra data they didn't consume are invalid anyway, I hope this problem will not cause too much inconvenience.

New features -----------------

* Range-limited binary file enumeration (requested + initial patch by Bardur Arantsson).

* splitWhen , based on the "split" package < http://hackage.haskell.org/package/split >

* 0.4.6: Typeable instances for most types (requested by Michael Snoyman)

* 0.4.5: joinE , which simplifies enumerator/enumeratee composition (requested by Michael Snoyman)

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

John Millikin

7:03 p.m.

On Mar 26, 10:46 am, Michael Snoyman wrote:

...

As far as the left-over data in a yield issue: does that require a breaking API change, or a change to the definition of >>= which would change semantics??

It requires a pretty serious API change, as the definition of 'Iteratee' itself is at fault. Unfortunately, Oleg's new definitions also have problems (they can yield extra on a continue step), so I'm at a bit of a loss as to what to do. Either way, underlying primitives allow users to create iteratees with invalid/undefined behavior. Not very Haskell-y. All of the new high-level functions added in recent versions are part of an attempted workaround. I'd like to move the Iteratee definitions themselves to a ``Data.Enumerator.Internal`` module, and add some words discouraging their direct use. There would still be some API breaks (the >>== , $$, and >==> operators would go away) but at least clients wouldn't be subjected to a complete rewrite. Since the API is being broken anyway, I'm also going to take the opportunity to change the Stream type so it can represent "EOF + some data". That should allow lots of interesting behaviors, such as arbitrary lookahead.

Gregory Collins

7:25 p.m.

On Sat, Mar 26, 2011 at 8:03 PM, John Millikin wrote:

...

On Mar 26, 10:46 am, Michael Snoyman wrote:

...
As far as the left-over data in a yield issue: does that require a breaking API change, or a change to the definition of >>= which would change semantics??

It requires a pretty serious API change, as the definition of 'Iteratee' itself is at fault. Unfortunately, Oleg's new definitions also have problems (they can yield extra on a continue step), so I'm at a bit of a loss as to what to do. Either way, underlying primitives allow users to create iteratees with invalid/undefined behavior. Not very Haskell-y.

You can also write an iteratee which doesn't move to the "Done" state when it gets EOF. This is equally "bad", i.e. not actually much of a problem in practice.

...

Since the API is being broken anyway, I'm also going to take the opportunity to change the Stream type so it can represent "EOF + some data". That should allow lots of interesting behaviors, such as arbitrary lookahead.

The thing which I find is missing the most from enumerator as it stands is not this -- it's the fact that Iteratees sometimes need to allocate resources which need explicit manual deallocation (i.e. sockets, file descriptors, mmaps, etc), but because Enumerators are running the show, there is no "local" way to ensure that the cleanup/bracket routines get run on error. This hurts composability, because you are forced to either allocate these resources outside the body of the enumerator (where you can bracket "run_") or play finalizer-on-mvar tricks with the garbage collector. This kind of sucks. The iteratee package has an error constructor on the Stream type for this purpose; I think you could do that -- with the downside that you need to pattern-match against another constructor in mainline code, hurting performance -- or is there some other reasonable way to deal with it? G -- Gregory Collins

John Millikin

9:26 p.m.

On 2011-03-26, Gregory Collins wrote:

...

...
Since the API is being broken anyway, I'm also going to take the opportunity to change the Stream type so it can represent "EOF + some data". That should allow lots of interesting behaviors, such as arbitrary lookahead.

The thing which I find is missing the most from enumerator as it stands is not this -- it's the fact that Iteratees sometimes need to allocate resources which need explicit manual deallocation (i.e. sockets, file descriptors, mmaps, etc), but because Enumerators are running the show, there is no "local" way to ensure that the cleanup/bracket routines get run on error. This hurts composability, because you are forced to either allocate these resources outside the body of the enumerator (where you can bracket "run_") or play finalizer-on-mvar tricks with the garbage collector. This kind of sucks.

I agree that it sucks, but it's a tradeoff of the left-fold enumerator design. Potential solutions are welcome.

...

The iteratee package has an error constructor on the Stream type for this purpose; I think you could do that -- with the downside that you need to pattern-match against another constructor in mainline code, hurting performance -- or is there some other reasonable way to deal with it?

I don't think this would help. Remember that the iteratee has *no* control whatsoever over its lifetime. There is no guarantee that a higher-level enumerator or enumeratee will actually feed it data until it has enough; the computation can be interrupted at any level. Looking at the iteratee package's Stream constructor, I think it doesn't do what you think it does. While it might help with resource management in a specific case, it won't help if (for example) an enumeratee above your iteratee decides to yield.

John A. De Goes

8:33 p.m.

I noticed this problem some time ago. Beyond just breaking monadic associativity, there are many other issues with standard definitions of iteratees: 1. It does not make sense in general to bind with an iteratee that has already consumed input, but there's no type-level difference between a "virgin" iteratee and one that has already consumed input; 2. Error recovery is ill-defined because errors do not describe what portion of the input they have already consumed; 3. Iteratees sometimes need to manage resources, but they're not designed to do so which leads to hideous workarounds; 4. Iteratees cannot incrementally produce output, it's all or nothing, which makes them terrible for many real world problems that require both incremental input and incremental output. Overall, I regard iteratees as only a partial success. They're leaky and somewhat unsafe abstractions. I'm experimenting with Mealy machines because I think they have more long-term promise to solve the problems of iteratees. Regards, John A. De Goes Twitter: @jdegoes LinkedIn: http://linkedin.com/in/jdegoes On Mar 26, 2011, at 1:03 PM, John Millikin wrote:

...

On Mar 26, 10:46 am, Michael Snoyman wrote:

...
As far as the left-over data in a yield issue: does that require a breaking API change, or a change to the definition of >>= which would change semantics??

It requires a pretty serious API change, as the definition of 'Iteratee' itself is at fault. Unfortunately, Oleg's new definitions also have problems (they can yield extra on a continue step), so I'm at a bit of a loss as to what to do. Either way, underlying primitives allow users to create iteratees with invalid/undefined behavior. Not very Haskell-y.

All of the new high-level functions added in recent versions are part of an attempted workaround. I'd like to move the Iteratee definitions themselves to a ``Data.Enumerator.Internal`` module, and add some words discouraging their direct use. There would still be some API breaks (the >>== , $$, and >==> operators would go away) but at least clients wouldn't be subjected to a complete rewrite.

Since the API is being broken anyway, I'm also going to take the opportunity to change the Stream type so it can represent "EOF + some data". That should allow lots of interesting behaviors, such as arbitrary lookahead.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

wren ng thornton

9:12 p.m.

On 3/26/11 4:33 PM, John A. De Goes wrote:

...

4. Iteratees cannot incrementally produce output, it's all or nothing, which makes them terrible for many real world problems that require both incremental input and incremental output.

For this one, enumeratees are the proposed solution. But for some reason enumeratees are oft overlooked. -- Live well, ~wren

John A. De Goes

27 Mar 27 Mar

3:38 p.m.

Enumeratees solve some use cases but not others. Let's say you want to incrementally compress a 2 GB file. If you use an enumeratee to do this, your "transformer" iteratee has to do IO. I'd prefer an abstraction to incrementally and purely produce the output from a stream of input. Regards, John A. De Goes Twitter: @jdegoes LinkedIn: http://linkedin.com/in/jdegoes On Mar 26, 2011, at 3:12 PM, wren ng thornton wrote:

...

On 3/26/11 4:33 PM, John A. De Goes wrote:

...
4. Iteratees cannot incrementally produce output, it's all or nothing, which makes them terrible for many real world problems that require both incremental input and incremental output.

For this one, enumeratees are the proposed solution. But for some reason enumeratees are oft overlooked.

-- Live well, ~wren

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

John Millikin

4:43 p.m.

On Sunday, March 27, 2011 8:38:38 AM UTC-7, John A. De Goes wrote:

...

Enumeratees solve some use cases but not others. Let's say you want to incrementally compress a 2 GB file. If you use an enumeratee to do this, your "transformer" iteratee has to do IO. I'd prefer an abstraction to incrementally and purely produce the output from a stream of input.

There's no reason the transformer has to do IO. Right now a lot of the interesting enumerator-based packages are actually bindings to C libraries, so they are forced to use IO, but there's nothing inherent in the enumeratee design to require it. For example, the text codec enumeratees "encode" and "decode" in Data.Enumerator.Text are pure. I'm working on ideas for writing pure enumeratees to bound libraries, but they will likely only work if the underlying library fully exposes its state, like zlib. Libraries with private or very complex internal states, such as libxml or expat, will probably never be implementable in pure enumeratees.

wren ng thornton

9:22 p.m.

On 3/27/11 11:38 AM, John A. De Goes wrote:

...

Enumeratees solve some use cases but not others. Let's say you want to incrementally compress a 2 GB file. If you use an enumeratee to do this, your "transformer" iteratee has to do IO. I'd prefer an abstraction to incrementally and purely produce the output from a stream of input.

I don't see why? In pseudocode we could have, enumRead2GBFile :: FilePath -> Enumerator IO ByteString enumRead2GBFile file iter0 = do fd <- open file let loop iter = do mline <- read fd case mline of Nothing -> return iter Just line -> do iter' <- feed iter line if isDone iter' then return iter' else loop iter' iterF <- loop iter0 close fd return iterF compress :: Monad m => Enumeratee m ByteString ByteString compress = go state0 where go state = do chunk <- get let (state',hash) = compressify state chunk put hash go state' compressify :: Foo -> ByteString -> (Foo,ByteString) it's just a pipeline like function composition or shell pipes. There's no reason intermediate points of the pipeline have do anything impure. -- Live well, ~wren

John A. De Goes

10:44 p.m.

This formulation does not let me control the production of compressed chunks independently from the provision of input; a receiver may only be capable of consuming a tiny amount at a time, and I may have to resend some chunks. Which is the whole point: iteratee & friends are lopsided. They provide excellent control of an input stream to the iteratee, but there is no structure permitting equivalent control of the output stream. Regards, John A. De Goes Twitter: @jdegoes LinkedIn: http://linkedin.com/in/jdegoes On Mar 27, 2011, at 3:22 PM, wren ng thornton wrote:

...

On 3/27/11 11:38 AM, John A. De Goes wrote:

...
Enumeratees solve some use cases but not others. Let's say you want to incrementally compress a 2 GB file. If you use an enumeratee to do this, your "transformer" iteratee has to do IO. I'd prefer an abstraction to incrementally and purely produce the output from a stream of input.

I don't see why? In pseudocode we could have,

enumRead2GBFile :: FilePath -> Enumerator IO ByteString enumRead2GBFile file iter0 = do fd <- open file let loop iter = do mline <- read fd case mline of Nothing -> return iter Just line -> do iter' <- feed iter line if isDone iter' then return iter' else loop iter' iterF <- loop iter0 close fd return iterF

compress :: Monad m => Enumeratee m ByteString ByteString compress = go state0 where go state = do chunk <- get let (state',hash) = compressify state chunk put hash go state'

compressify :: Foo -> ByteString -> (Foo,ByteString)

it's just a pipeline like function composition or shell pipes. There's no reason intermediate points of the pipeline have do anything impure.

-- Live well, ~wren

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

John Millikin

28 Mar 28 Mar

1:58 a.m.

If the receiver can only accept very small chunks, you can put a rechunking stage in between the compression and iteratee: ----------------------------------------------------------- verySmallChunks :: Monad m => Enumeratee ByteString ByteString m b verySmallSchunks = sequence (take 10) ----------------------------------------------------------- Resending is slightly more complex -- if the other end can say "resend that last chunk", then it should be easy enough, but "resend the last 2 bytes of that chunk you sent 5 minutes ago" would be much harder. What is your use case?

wren ng thornton

4:54 a.m.

On 3/27/11 9:58 PM, John Millikin wrote:

...

Resending is slightly more complex -- if the other end can say "resend that last chunk", then it should be easy enough, but "resend the last 2 bytes of that chunk you sent 5 minutes ago" would be much harder. What is your use case?

This does highlight one of the restrictions I've lamented about the iteratee framework. Namely that the current versions I've seen place unnecessary limitations on the communication the iteratee is allowed to give to the enumerator/enumeratees above it. This is often conflated with the iteratee throwing an error/exception, which is wrong because we should distinguish between bad program states and argument passing. Moreover the type system doesn't capture the kinds of communication iteratees assume of their enumerators/enumeratees, nor the kinds of communication supported by the enumerators/enumeratees, which means that failure to hook them up in the right (non-typechecked) way /does/ constitute an error. The one example that tends to be supported is the iteratee requesting that the enumerator/enumeratees seek to a given position in a file. Which is a good example, but it's not the only one. Requesting the resending of chunks is another good example. But there's no limit to the reasonable kinds of communication an iteratee could want. In an ideal framework the producers, transformers, and consumers of stream data would have a type parameter indicating the up-stream communication they support or require (in addition to the type parameters for stream type, result type, and side-effect type). That way clients can just define an ADT for their communication protocol, and be done with it. There may still be issues with the Expression Problem, but at least those are pushed out of the stream processing framework itself which really shouldn't care about the types of communication used. -- Live well, ~wren

James Cook

4:21 p.m.

On Mar 28, 2011, at 12:54 AM, wren ng thornton wrote:

...

On 3/27/11 9:58 PM, John Millikin wrote:

...
Resending is slightly more complex -- if the other end can say "resend that last chunk", then it should be easy enough, but "resend the last 2 bytes of that chunk you sent 5 minutes ago" would be much harder. What is your use case?

This does highlight one of the restrictions I've lamented about the iteratee framework. Namely that the current versions I've seen place unnecessary limitations on the communication the iteratee is allowed to give to the enumerator/enumeratees above it. This is often conflated with the iteratee throwing an error/exception, which is wrong because we should distinguish between bad program states and argument passing. Moreover the type system doesn't capture the kinds of communication iteratees assume of their enumerators/enumeratees, nor the kinds of communication supported by the enumerators/ enumeratees, which means that failure to hook them up in the right (non-typechecked) way /does/ constitute an error.

The one example that tends to be supported is the iteratee requesting that the enumerator/enumeratees seek to a given position in a file. Which is a good example, but it's not the only one. Requesting the resending of chunks is another good example. But there's no limit to the reasonable kinds of communication an iteratee could want.

In an ideal framework the producers, transformers, and consumers of stream data would have a type parameter indicating the up-stream communication they support or require (in addition to the type parameters for stream type, result type, and side-effect type). That way clients can just define an ADT for their communication protocol, and be done with it. There may still be issues with the Expression Problem, but at least those are pushed out of the stream processing framework itself which really shouldn't care about the types of communication used.

It's somewhat outdated and underdeveloped (I was writing for myself so I never really bothered finishing it), but I wrote an exploration of iteratee semantics[1] a while back in which I specified an iteratee as a monad-transformer stack involving, at its core, the "PromptT" or "ProgramT" monad transformers (as far as I know, the same could be done with the "Coroutine" monad). I personally found that construction far more lucid than the usual ad-hoc view, and it also makes it very clear how the model can be trivially extended to support additional operations such as these. Based on what I learned while writing that (and on the similarity between coroutines and the concepts I used), I strongly agree with Mario Blažević, suggestion to look at his monad-coroutine library as a way of understanding where they fit in some larger design space. I would even go so far as to suggest that something like it could be considered as either a replacement for iteratees or as the underlying implementation of an iteratee library, because the concept not only subsumes iteratees and enumerators, but also delegates control to code that can be independent of both rather than simply reversing the "conventional" iterator concept. I believe it also subsumes iterators and whatever their corresponding parts are called. As he mentions, his implementation does not come with all the "plumbing", but I think it would be worthwhile to create that plumbing, because either coroutines or "operational monads" may very well be the basis needed to develop a "grand unified theory" of composable stream processing. If nothing else, the isomorphisms in his coroutine-enumerator[2] and coroutine-iteratee[3] packages seem to give a much more direct and useful iteratee semantics than I've seen given anywhere else, and at the same time they are much more readily extended to cover additional operations. -- James 1. https://github.com/mokus0/junkbox/tree/master/Papers/HighLevelIteratees 2. http://hackage.haskell.org/package/coroutine-enumerator 3. http://hackage.haskell.org/package/coroutine-iteratee

John A. De Goes

5:09 p.m.

Now THAT"s what I'm talking about. Augment such a solution with interruptible & resumable data producers, and I'd have everything I need. Regards, John A. De Goes Twitter: @jdegoes LinkedIn: http://linkedin.com/in/jdegoes On Mar 27, 2011, at 10:54 PM, wren ng thornton wrote:

...

In an ideal framework the producers, transformers, and consumers of stream data would have a type parameter indicating the up-stream communication they support or require (in addition to the type parameters for stream type, result type, and side-effect type). That way clients can just define an ADT for their communication protocol, and be done with it. There may still be issues with the Expression Problem, but at least those are pushed out of the stream processing framework itself which really shouldn't care about the types of communication used.

John A. De Goes

5:16 p.m.

This isn't quite what I'm after. I want to pull chunks on demand (i.e. have control over both the input and the output). Enumeratees don't allow me to do that. Regards, John A. De Goes Twitter: @jdegoes LinkedIn: http://linkedin.com/in/jdegoes On Mar 27, 2011, at 7:58 PM, John Millikin wrote:

...

If the receiver can only accept very small chunks, you can put a rechunking stage in between the compression and iteratee:

----------------------------------------------------------- verySmallChunks :: Monad m => Enumeratee ByteString ByteString m b verySmallSchunks = sequence (take 10) -----------------------------------------------------------

Resending is slightly more complex -- if the other end can say "resend that last chunk", then it should be easy enough, but "resend the last 2 bytes of that chunk you sent 5 minutes ago" would be much harder. What is your use case?

Mario Blažević

26 Mar 26 Mar

10:24 p.m.

On 11-03-26 04:33 PM, John A. De Goes wrote:

...

I noticed this problem some time ago. Beyond just breaking monadic associativity, there are many other issues with standard definitions of iteratees:

1. It does not make sense in general to bind with an iteratee that has already consumed input, but there's no type-level difference between a "virgin" iteratee and one that has already consumed input;

2. Error recovery is ill-defined because errors do not describe what portion of the input they have already consumed;

3. Iteratees sometimes need to manage resources, but they're not designed to do so which leads to hideous workarounds;

4. Iteratees cannot incrementally produce output, it's all or nothing, which makes them terrible for many real world problems that require both incremental input and incremental output.

Overall, I regard iteratees as only a partial success. They're leaky and somewhat unsafe abstractions.

Out of curiosity, have you looked at the monad-coroutine library? It's a more generic and IMO much cleaner model, though I wouldn't recommend it as a replacement because the enumerator and iteratee libraries come with more predefined plumbing. I think your point #1 still stands, but others can all be made to disappear - as long as you define your suspension functors properly.

...

I'm experimenting with Mealy machines because I think they have more long-term promise to solve the problems of iteratees.

Do you mean a sort of a transducer monad transformer or an actual finite state machine? The latter would seem rather restrictive.

John A. De Goes

27 Mar 27 Mar

3:42 p.m.

On Mar 26, 2011, at 4:24 PM, Mario Blažević wrote:

...

On 11-03-26 04:33 PM, John A. De Goes wrote: Out of curiosity, have you looked at the monad-coroutine library? It's a more generic and IMO much cleaner model, though I wouldn't recommend it as a replacement because the enumerator and iteratee libraries come with more predefined plumbing. I think your point #1 still stands, but others can all be made to disappear - as long as you define your suspension functors properly.

I haven't looked at it. I will take a look.

...

Do you mean a sort of a transducer monad transformer or an actual finite state machine? The latter would seem rather restrictive.

Yes, I mean transducer monad transformer, especially if you equate "mealy machine" with "finite state machine". I equate mealy machine with "two-taped transducer". Regards, John A. De Goes Twitter: @jdegoes LinkedIn: http://linkedin.com/in/jdegoes

Ertugrul Soeylemez

1:12 a.m.

Hello John, great stuff! Many thanks. I can't await the 0.5 release of your library. By the way, I believe that the 'enumHandleSession' and 'enumHandleTimeout' enumerators from the 'netlines' library really belong into this one. You will want to have timeout support, when reading from a network handle. Keep up the good work! Greets, Ertugrul John Millikin wrote:

...

----------------------------------------------------------------------------------------------------------------------------------------------------- Enumerators are an efficient, predictable, and safe alternative to lazy I/O. Discovered by Oleg Kiselyov, they allow large datasets to be processed in near–constant space by pure code. Although somewhat more complex to write, using enumerators instead of lazy I/O produces more correct programs.

http://hackage.haskell.org/package/enumerator http://john-millikin.com/software/enumerator/ -----------------------------------------------------------------------------------------------------------------------------------------------------

Hello -cafe,

It's been a while since the last point release of enumerator. This one is sufficiently large that I think folks might want to know about it, and since I try not to spam too many announcements, I'll give a quick rundown on major changes in other 0.4.x versions as well.

First, most of what I call "list analogues" -- enumerator-based versions of 'head', 'take', 'map', etc -- have been separated into three modules (Data.Enumerator.List, .Binary, and .Text) depending on what sorts of data they operate on. This separation has been an ongoing process throughout 0.4.x releases, and I think it's now complete. The old names in Data.Enumerator will continue to exist in 0.4.x versions, but will be removed in 0.5.

Second, Gregory Collins and Ertugrul Soeylemez found a space leak in Iteratee's (>>=), which could cause eventual space exhaustion in some circumstances. If you use enumerators to process very large or infinite streams, you probably want to upgrade to version 0.4.7 or higher.

Third, the source code PDF has seen some substantial improvement -- if you're interested in how the library is implemented, or have insomnia, read it at < http://john-millikin.com/software/enumerator/enumerator_0.4.8.pdf

...
Finally, there is a known issue in the current encoding of iteratees -- if an iteratee yields extra data but never consumed anything, that iteratee will violate the monad law of associativity. Oleg has updated his implementations to fix this problem, but since it would break a *lot* of dependent libraries, I'm holding off until the vague future of version 0.5. Since iteratees that yield extra data they didn't consume are invalid anyway, I hope this problem will not cause too much inconvenience.

New features -----------------

* Range-limited binary file enumeration (requested + initial patch by Bardur Arantsson).

* splitWhen , based on the "split" package < http://hackage.haskell.org/package/split >

* 0.4.6: Typeable instances for most types (requested by Michael Snoyman)

* 0.4.5: joinE , which simplifies enumerator/enumeratee composition (requested by Michael Snoyman)

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

-- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

John Millikin

3:50 a.m.

Hello Ertugrul Söylemez, Good idea -- I've added an ``enumSocketTimed`` and ``iterSocketTimed`` to the network-enumerator package at < http://hackage.haskell.org/package/network-enumerator >. ``enumSocketTimed`` is equivalent to your ``enumHandleTimeout``, but instead of Handle uses the more efficient Socket type. For setting a global timeout on an entire session, it's better to wrap the ``run_`` call with ``System.Timeout.timeout`` -- this is more efficient than testing the time on every chunk, and does not require a specialised enumerator. The signatures/docs are: -------------------------------------------------------- -- | Enumerate binary data from a 'Socket', using 'recv'. The socket must -- be connected. -- -- The buffer size should be a small power of 2, such as 4096. -- -- If any call to 'recv' takes longer than the timeout, 'enumSocketTimed' -- will throw an error. To add a timeout for the entire session, wrap the -- call to 'E.run' in 'timeout'. -- -- Since: 0.1.2 enumSocketTimed :: MonadIO m => Integer -- ^ Buffer size -> Integer -- ^ Timeout, in microseconds -> S.Socket -> E.Enumerator B.ByteString m b -- | Write data to a 'S.Socket', using 'sendMany'. The socket must be connected. -- -- If any call to 'sendMany' takes longer than the timeout, 'iterSocketTimed' -- will throw an error. To add a timeout for the entire session, wrap the -- call to 'E.run' in 'timeout'. -- -- Since: 0.1.2 iterSocketTimed :: MonadIO m => Integer -- ^ Timeout, in microseconds -> S.Socket -> E.Iteratee B.ByteString m () --------------------------------------------------------

Ertugrul Soeylemez

28 Mar 28 Mar

4:45 a.m.

John Millikin wrote:

...

Good idea -- I've added an ``enumSocketTimed`` and ``iterSocketTimed`` to the network-enumerator package at < http://hackage.haskell.org/package/network-enumerator

...
. ``enumSocketTimed`` is equivalent to your ``enumHandleTimeout``, but instead of Handle uses the more efficient Socket type.

For simple applications working with handles is much more convenient, so I decided to implement a timed handle enumerator instead of a socket enumerator. Perhaps it would be a good idea to add your 'enumSocketTimed' and 'iterSocketTimed' to my netlines package, too. Also I should add 'iterHandleTimeout'.

...

For setting a global timeout on an entire session, it's better to wrap the ``run_`` call with ``System.Timeout.timeout`` -- this is more efficient than testing the time on every chunk, and does not require a specialised enumerator.

It may be more efficient, but I don't really like it. I like robust applications, and to me killing a thread is always a mistake, even if the thread is kill-safe. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

John Millikin

3:06 p.m.

On Sunday, March 27, 2011 9:45:23 PM UTC-7, Ertugrul Soeylemez wrote:

...

...
For setting a global timeout on an entire session, it's better to wrap the ``run_`` call with ``System.Timeout.timeout`` -- this is more efficient than testing the time on every chunk, and does not require a specialised enumerator. It may be more efficient, but I don't really like it. I like robust

applications, and to me killing a thread is always a mistake, even if the thread is kill-safe.

``timeout`` doesn't kill the thread, it just returns ``Nothing`` if the computation took longer than expected.

David Leimbach

3:43 p.m.

On Mon, Mar 28, 2011 at 8:06 AM, John Millikin wrote:

...

On Sunday, March 27, 2011 9:45:23 PM UTC-7, Ertugrul Soeylemez wrote:

...
...
For setting a global timeout on an entire session, it's better to wrap the ``run_`` call with ``System.Timeout.timeout`` -- this is more efficient than testing the time on every chunk, and does not require a specialised enumerator. It may be more efficient, but I don't really like it. I like robust

applications, and to me killing a thread is always a mistake, even if the thread is kill-safe.

``timeout`` doesn't kill the thread, it just returns ``Nothing`` if the computation took longer than expected.

Timeout does kill the thread that is used for timing out :-). The thread that measures the timeout throws an exception to the worker thread that's being monitored. Either way you're interrupting a thread. Kill it or toss an exception at it, I don't see the difference really. Dave

...

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

John Millikin

29 Mar 29 Mar

5:50 a.m.

Since the release, a couple people have sent in feature requests, so I'm going to put out 0.4.9 in a day or so. New features will be: - tryIO: runs an IO computation, and converts any exceptions into ``throwError`` calls (requested by Kazu Yamamoto) - checkContinue: encapsulates a common pattern (loop (Continue k) = ...) when defining enumerators - mapAccum and mapAccum: sort of like map and mapM, except the step function is stateful (requested by Long Huynh Huu) Anyone else out there sitting on a request? Please send them in -- I am always happy to receive them, even if they must be declined. --- Also, I would like to do a quick poll regarding operators. 1. It has been requested that I add operator aliases for joinI and joinE. 2. There have been complaints that the library defines too many operators (currently, 5). Do any existing enumerator users, or anyone for that matter, have an opinion either way? The proposed operators are: ---------------------------------------------------------------------- infixr 0 =$ infixr 0 $= (=$) :: Monad m => Enumeratee ao ai m b -> Iteratee ai m b -> Iteratee ao m b enum =$ iter = joinI (enum $$ iter) ($=) :: Monad m => Enumerator ao m (Step ai m b) -> Enumeratee ao ai m b -> Enumerator ai m b ($=) = joinE ----------------------------------------------------------------------

Michael Snoyman

5:55 a.m.

On Tue, Mar 29, 2011 at 7:50 AM, John Millikin wrote:

...

Since the release, a couple people have sent in feature requests, so I'm going to put out 0.4.9 in a day or so.

New features will be:

- tryIO: runs an IO computation, and converts any exceptions into ``throwError`` calls (requested by Kazu Yamamoto)

- checkContinue: encapsulates a common pattern (loop (Continue k) = ...) when defining enumerators

- mapAccum and mapAccum: sort of like map and mapM, except the step function is stateful (requested by Long Huynh Huu)

Anyone else out there sitting on a request? Please send them in -- I am always happy to receive them, even if they must be declined.

---

Also, I would like to do a quick poll regarding operators.

1. It has been requested that I add operator aliases for joinI and joinE.

2. There have been complaints that the library defines too many operators (currently, 5).

Do any existing enumerator users, or anyone for that matter, have an opinion either way?

The proposed operators are:

---------------------------------------------------------------------- infixr 0 =$ infixr 0 $=

(=$) :: Monad m => Enumeratee ao ai m b -> Iteratee ai m b -> Iteratee ao m b enum =$ iter = joinI (enum $$ iter)

($=) :: Monad m => Enumerator ao m (Step ai m b) -> Enumeratee ao ai m b -> Enumerator ai m b ($=) = joinE ----------------------------------------------------------------------

The operators sound good to me. My only request would be to put in a usage example in the documentation. I'd be happy to write one if you'd like. Personally, I think that =$ will *greatly* clean up my code. Michael

Kazu Yamamoto

30 Mar 30 Mar

1:48 a.m.

Hello,

...

...
(=$) :: Monad m => Enumeratee ao ai m b -> Iteratee ai m b -> Iteratee ao m b enum =$ iter = joinI (enum $$ iter)

($=) :: Monad m => Enumerator ao m (Step ai m b) -> Enumeratee ao ai m b -> Enumerator ai m b ($=) = joinE ----------------------------------------------------------------------

The operators sound good to me. My only request would be to put in a usage example in the documentation. I'd be happy to write one if you'd like. Personally, I think that =$ will *greatly* clean up my code.

I have a tutorial to describe how to use the enumerator library in Japanese. Since it is popular among the Haskell community in Japan, I guess it's worth translating into English. So, I did. http://www.mew.org/~kazu/proj/enumerator/ This tutorial explains how to use (=$) and ($=) as well as other operators(($$), (<==<), (>>=)). Of course, my English is broken. If English native speakers will kindly correct broken grammar, it would be appreciated. I'm reachable by e-mail or twitter (@kazu_yamamoto). --Kazu

John Millikin

29 Mar 29 Mar

6:22 p.m.

0.4.9 has been uploaded to cabal, with the new operators. Changes are in the replied-to post (and also quoted below), plus the new operators proposed by Kazu Yamamoto. Here's the corresponding docs (they have examples!) ------------------------------------------------------------------------------------------------------ -- | @enum =$ iter = 'joinI' (enum $$ iter)@ -- -- “Wraps” an iteratee /inner/ in an enumeratee /wrapper/. -- The resulting iteratee will consume /wrapper/’s input type and -- yield /inner/’s output type. -- -- Note: if the inner iteratee yields leftover input when it finishes, -- that extra will be discarded. -- -- As an example, consider an iteratee that converts a stream of UTF8-encoded -- bytes into a single 'TL.Text': -- -- > consumeUTF8 :: Monad m => Iteratee ByteString m Text -- -- It could be written with either 'joinI' or '(=$)': -- -- > import Data.Enumerator.Text as ET -- > -- > consumeUTF8 = joinI (decode utf8 $$ ET.consume) -- > consumeUTF8 = decode utf8 =$ ET.consume -- -- Since: 0.4.9 -- | @enum $= enee = 'joinE' enum enee@ -- -- “Wraps” an enumerator /inner/ in an enumeratee /wrapper/. -- The resulting enumerator will generate /wrapper/’s output type. -- -- As an example, consider an enumerator that yields line character counts -- for a text file (e.g. for source code readability checking): -- -- > enumFileCounts :: FilePath -> Enumerator Int IO b -- -- It could be written with either 'joinE' or '($=)': -- -- > import Data.Text as T -- > import Data.Enumerator.List as EL -- > import Data.Enumerator.Text as ET -- > -- > enumFileCounts path = joinE (enumFile path) (EL.map T.length) -- > enumFileCounts path = enumFile path $= EL.map T.length -- -- Since: 0.4.9 ------------------------------------------------------------------------------------------------------ Minor release note -- 0.4.9 and 0.4.9.1 are the exact same code; I just forgot a @ in one of the new docs and had to re-upload so Hackage would haddock properly. There is no difference in behavior. On Monday, March 28, 2011 10:50:45 PM UTC-7, John Millikin wrote:

...

Since the release, a couple people have sent in feature requests, so I'm going to put out 0.4.9 in a day or so.

New features will be:

- tryIO: runs an IO computation, and converts any exceptions into ``throwError`` calls (requested by Kazu Yamamoto)

- checkContinue: encapsulates a common pattern (loop (Continue k) = ...) when defining enumerators

- mapAccum and mapAccum: sort of like map and mapM, except the step function is stateful (requested by Long Huynh Huu)

Anyone else out there sitting on a request? Please send them in -- I am always happy to receive them, even if they must be declined.

---

Also, I would like to do a quick poll regarding operators.

1. It has been requested that I add operator aliases for joinI and joinE.

2. There have been complaints that the library defines too many operators (currently, 5).

Do any existing enumerator users, or anyone for that matter, have an opinion either way?

The proposed operators are:

---------------------------------------------------------------------- infixr 0 =$ infixr 0 $=

(=$) :: Monad m => Enumeratee ao ai m b -> Iteratee ai m b -> Iteratee ao m b enum =$ iter = joinI (enum $$ iter)

($=) :: Monad m => Enumerator ao m (Step ai m b) -> Enumeratee ao ai m b -> Enumerator ai m b ($=) = joinE ----------------------------------------------------------------------

Michael Snoyman

7 p.m.

Thanks, I look forward to being able to use these new operators! On Tue, Mar 29, 2011 at 8:22 PM, John Millikin wrote:

...

0.4.9 has been uploaded to cabal, with the new operators. Changes are in the replied-to post (and also quoted below), plus the new operators proposed by Kazu Yamamoto.

Here's the corresponding docs (they have examples!)

------------------------------------------------------------------------------------------------------ -- | @enum =$ iter = 'joinI' (enum $$ iter)@ -- -- “Wraps” an iteratee /inner/ in an enumeratee /wrapper/. -- The resulting iteratee will consume /wrapper/’s input type and -- yield /inner/’s output type. -- -- Note: if the inner iteratee yields leftover input when it finishes, -- that extra will be discarded. -- -- As an example, consider an iteratee that converts a stream of UTF8-encoded -- bytes into a single 'TL.Text': -- -- > consumeUTF8 :: Monad m => Iteratee ByteString m Text -- -- It could be written with either 'joinI' or '(=$)': -- -- > import Data.Enumerator.Text as ET -- > -- > consumeUTF8 = joinI (decode utf8 $$ ET.consume) -- > consumeUTF8 = decode utf8 =$ ET.consume -- -- Since: 0.4.9

-- | @enum $= enee = 'joinE' enum enee@ -- -- “Wraps” an enumerator /inner/ in an enumeratee /wrapper/. -- The resulting enumerator will generate /wrapper/’s output type. -- -- As an example, consider an enumerator that yields line character counts -- for a text file (e.g. for source code readability checking): -- -- > enumFileCounts :: FilePath -> Enumerator Int IO b -- -- It could be written with either 'joinE' or '($=)': -- -- > import Data.Text as T -- > import Data.Enumerator.List as EL -- > import Data.Enumerator.Text as ET -- > -- > enumFileCounts path = joinE (enumFile path) (EL.map T.length) -- > enumFileCounts path = enumFile path $= EL.map T.length -- -- Since: 0.4.9 ------------------------------------------------------------------------------------------------------

Minor release note -- 0.4.9 and 0.4.9.1 are the exact same code; I just forgot a @ in one of the new docs and had to re-upload so Hackage would haddock properly. There is no difference in behavior.

On Monday, March 28, 2011 10:50:45 PM UTC-7, John Millikin wrote:

...
Since the release, a couple people have sent in feature requests, so I'm going to put out 0.4.9 in a day or so.

New features will be:

- tryIO: runs an IO computation, and converts any exceptions into ``throwError`` calls (requested by Kazu Yamamoto)

- checkContinue: encapsulates a common pattern (loop (Continue k) = ...) when defining enumerators

- mapAccum and mapAccum: sort of like map and mapM, except the step function is stateful (requested by Long Huynh Huu)

Anyone else out there sitting on a request? Please send them in -- I am always happy to receive them, even if they must be declined.

---

Also, I would like to do a quick poll regarding operators.

1. It has been requested that I add operator aliases for joinI and joinE.

2. There have been complaints that the library defines too many operators (currently, 5).

Do any existing enumerator users, or anyone for that matter, have an opinion either way?

The proposed operators are:

---------------------------------------------------------------------- infixr 0 =$ infixr 0 $=

(=$) :: Monad m => Enumeratee ao ai m b -> Iteratee ai m b -> Iteratee ao m b enum =$ iter = joinI (enum $$ iter)

($=) :: Monad m => Enumerator ao m (Step ai m b) -> Enumeratee ao ai m b -> Enumerator ai m b ($=) = joinE ----------------------------------------------------------------------

Ertugrul Soeylemez

11:15 p.m.

Hello John, Sorry that I'm late. And honestly one day for request submissions is a bit narrow. I have a request, too: Right now it is difficult to compose enumeratees. An equivalent of (.) for enumeratees would be great. So instead of: joinI $ e1 $$ joinI $ e2 $$ iter one could write let e = e1 .= e2 in e =$ iter I would appreciate a 0.4.10 with such a composition operator. Greets, Ertugrul John Millikin wrote:

...

Since the release, a couple people have sent in feature requests, so I'm going to put out 0.4.9 in a day or so.

New features will be:

- tryIO: runs an IO computation, and converts any exceptions into ``throwError`` calls (requested by Kazu Yamamoto)

- checkContinue: encapsulates a common pattern (loop (Continue k) = ...) when defining enumerators

- mapAccum and mapAccum: sort of like map and mapM, except the step function is stateful (requested by Long Huynh Huu)

Anyone else out there sitting on a request? Please send them in -- I am always happy to receive them, even if they must be declined.

---

Also, I would like to do a quick poll regarding operators.

1. It has been requested that I add operator aliases for joinI and joinE.

2. There have been complaints that the library defines too many operators (currently, 5).

Do any existing enumerator users, or anyone for that matter, have an opinion either way?

The proposed operators are:

---------------------------------------------------------------------- infixr 0 =$ infixr 0 $=

(=$) :: Monad m => Enumeratee ao ai m b -> Iteratee ai m b -> Iteratee ao m b enum =$ iter = joinI (enum $$ iter)

($=) :: Monad m => Enumerator ao m (Step ai m b) -> Enumeratee ao ai m b -> Enumerator ai m b ($=) = joinE ----------------------------------------------------------------------

-- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Antoine Latter

30 Mar 30 Mar

12:03 a.m.

On Tue, Mar 29, 2011 at 6:15 PM, Ertugrul Soeylemez wrote:

...

Hello John,

Sorry that I'm late. And honestly one day for request submissions is a bit narrow.

I have a request, too: Right now it is difficult to compose enumeratees. An equivalent of (.) for enumeratees would be great. So instead of:

joinI $ e1 $$ joinI $ e2 $$ iter

one could write

let e = e1 .= e2 in e =$ iter

I would appreciate a 0.4.10 with such a composition operator.

Greets, Ertugrul

It looks like we can't quite fit Enumeratee into the Category typeclass (without newtypes, at least). That's a shame. Antoine

Ertugrul Soeylemez

2:03 a.m.

Antoine Latter wrote:

...

It looks like we can't quite fit Enumeratee into the Category typeclass (without newtypes, at least). That's a shame.

Yeah. Intuitively it looks like iteratees and enumeratees are excellent candidates for Category and even Arrow. Unfortunately they can either be monad transformers or arrows. You can't mix without, as you said, a newtype. That's very unfortunate. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

5219

Age (days ago)

5223

Last active (days ago)

List overview

Download

30 comments

11 participants

participants (11)

Antoine Latter
David Leimbach
Ertugrul Soeylemez
Gregory Collins
James Cook
John A. De Goes
John Millikin
Kazu Yamamoto
Mario Blažević
Michael Snoyman
wren ng thornton