Re: Left fold enumerator - a real pearl overlooked?

Hello Günther, I think the largest reason Haskellers don't use left-fold enumerators is that there isn't a ready-to-use package on Hackage. Oleg's code is extremely well commented and easy to follow, but it's not cabalized. In addition to Takusen, Johan Tibbe's hyena application server uses enumerators for IO: http://github.com/tibbe/hyena/tree/master There is a darcs repo of a cabalized iteratee package available at http://inmachina.net/~jwlato/haskell/iteratee/ This is essentially Oleg's code, slightly modified and reorganized. If anyone is interested in using left-fold enumerators for IO, please give it a look and let me know what you think. I'd like to put this on hackage in about a week or so, if possible. I would especially appreciate build reports. There are a few iteratee/enumerator design questions that remain, which Oleg and others would like to explore more fully. The results of that research will likely find there way into this library. Sincerely, John Lato
Hi all,
in the last few months I was looking for haskell database library, eventually settling for HDBC (thanks John btw).
Takusen also caught my eye although I even failed installing it.
Nevertheless a particular property of takusen, left-fold-enumerator, also sparked my interest and I tried to follow it up.
I had the impression this is something of relevance but something that is not mentioned or made use of often. There are actually only a few references on the net and most of those by one person only, Oleg.
There was one post from John Goerzen about Haskell HTTP libraries in which he hinted using left-fold enumerators.
Anyway what I'm saying is that the whole topic is somewhat off the radar of the larger haskell community, how come, or am I merely overestimating its relevance and usefulness?
Günther

jwlato:
Hello Günther,
I think the largest reason Haskellers don't use left-fold enumerators is that there isn't a ready-to-use package on Hackage. Oleg's code is extremely well commented and easy to follow, but it's not cabalized.
In addition to Takusen, Johan Tibbe's hyena application server uses enumerators for IO: http://github.com/tibbe/hyena/tree/master
There is a darcs repo of a cabalized iteratee package available at http://inmachina.net/~jwlato/haskell/iteratee/ This is essentially Oleg's code, slightly modified and reorganized. If anyone is interested in using left-fold enumerators for IO, please give it a look and let me know what you think. I'd like to put this on hackage in about a week or so, if possible. I would especially appreciate build reports.
There are a few iteratee/enumerator design questions that remain, which Oleg and others would like to explore more fully. The results of that research will likely find there way into this library.
I agree. There's no left-fold 'bytestring' equivalent. So it remains a special purpose technique. -- Don

Hi, thank you all for your responses. I see now that the subject did indeed register with some haskellers. :-) I had hoped that is will eventually become the tested and approved method for certain types of problems that do arise. Maybe some of you have read my earlier posts that I was developing an application, which is now finished btw. An initial design was using HAppS-IxSet instead of an SQL database, but it had hit performance problems, one a stack overflow and another a not enough RAM problem. The stack overflow problem I was eventually able to solve, though all kinds of strictness primitives would not help (!, seq and strict data constructors), following several different leads here on this list. But finally the not enough RAM problem occurred which I can't really explain why that happened, the memory consumption was so not linear to the input size, I may have caused that with my tricks of avoiding the stack overflow. Anyway, I eventually gave up and used SQLite. So I was hopeful the above mentioned left-fold-enumerator was some sort of general purpose resource preserving technique. I'm looking forward to study its implementation in the future release of takusen and see if I grasp it enough to translate it to other problems as well. And special thanks to you Don and your co-authors, your book saved my neck, with its help I was actually able to develop the application in an entirely new to me language and with a mere 3 months past deadline. Günther Don Stewart schrieb:
jwlato:
Hello Günther,
I think the largest reason Haskellers don't use left-fold enumerators is that there isn't a ready-to-use package on Hackage. Oleg's code is extremely well commented and easy to follow, but it's not cabalized.
In addition to Takusen, Johan Tibbe's hyena application server uses enumerators for IO: http://github.com/tibbe/hyena/tree/master
There is a darcs repo of a cabalized iteratee package available at http://inmachina.net/~jwlato/haskell/iteratee/ This is essentially Oleg's code, slightly modified and reorganized. If anyone is interested in using left-fold enumerators for IO, please give it a look and let me know what you think. I'd like to put this on hackage in about a week or so, if possible. I would especially appreciate build reports.
There are a few iteratee/enumerator design questions that remain, which Oleg and others would like to explore more fully. The results of that research will likely find there way into this library.
I agree. There's no left-fold 'bytestring' equivalent. So it remains a special purpose technique.
-- Don

Hello,
I'm not sure that I would call it a general-purpose resource
preserving technique. As I understand it, the general concept is a
means to handle strict data processing in a functional manner. Any
"resource preserving" that comes from this is actually from the use of
strict IO rather than lazy. I actually think it's rather like foldl'
compared to foldl.
John
On Sat, Feb 28, 2009 at 11:16 PM, G?uenther Schmidt
So I was hopeful the above mentioned left-fold-enumerator was some sort of general purpose resource preserving technique. I'm looking forward to study its implementation in the future release of takusen and see if I grasp it enough to translate it to other problems as well.

Hi Don,
Would you please elaborate on what features or capabilities you think
are missing from left-fold that would elevate it out of the special
purpose category? I think that the conception is so completely
different from bytestrings that just saying it's not a bytestring
equivalent doesn't give me any ideas as to what would make it more
useful. Since the technique is being actively developed and
researched, IMO this is a good time to be making changes.
Incidentally, in my package I've made newtypes that read data into
strict bytestrings. It would be relatively simple to use
unsafeInterleaveIO in an enumerator to create lazy bytestrings using
this technique. I don't see why anyone would want to do so, however,
since it would have all the negatives of lazy IO and be less efficient
than simply using lazy bytestrings directly.
Cheers,
John
On Sat, Feb 28, 2009 at 10:54 PM, Don Stewart
There are a few iteratee/enumerator design questions that remain, which Oleg and others would like to explore more fully. The results of that research will likely find there way into this library.
I agree. There's no left-fold 'bytestring' equivalent. So it remains a special purpose technique.
-- Don

Hi everyone, after reading all the responses I would like to ask someone, anyone, to kind of summarize the merits of the left-fold-enumerator approach. From all that I read so far about it all I was able to gather was that it has significance but I'm still not even sure what for and what not for. Apparently Oleg has done various CS work, this particular piece just being one. But he also broaches the topic at very high level, ok, too high for me, ie. no CS or higher math background. Would one of the super geeks please summarize it up? (In RWH kind of style if possible) Günther John Lato schrieb:
Hi Don,
Would you please elaborate on what features or capabilities you think are missing from left-fold that would elevate it out of the special purpose category? I think that the conception is so completely different from bytestrings that just saying it's not a bytestring equivalent doesn't give me any ideas as to what would make it more useful. Since the technique is being actively developed and researched, IMO this is a good time to be making changes.
Incidentally, in my package I've made newtypes that read data into strict bytestrings. It would be relatively simple to use unsafeInterleaveIO in an enumerator to create lazy bytestrings using this technique. I don't see why anyone would want to do so, however, since it would have all the negatives of lazy IO and be less efficient than simply using lazy bytestrings directly.
Cheers, John
On Sat, Feb 28, 2009 at 10:54 PM, Don Stewart
wrote: There are a few iteratee/enumerator design questions that remain, which Oleg and others would like to explore more fully. The results of that research will likely find there way into this library.
I agree. There's no left-fold 'bytestring' equivalent. So it remains a special purpose technique.
-- Don

OK, I'm far from being a supergeek, but anyways.
I'm not considering the lazy IO approach, as it doesn't involve any
form of control over resources.
With the traditional approach, you manually ask a stream do to
something (read a block of bytes, seek to a position etc.), and your
program is a mixture of stuff that asks the stream to do something and
stuff that deals with the results.
With the iteratee approach, you split the program into two parts:
Iteratee (a thing that encapsulates the state of your work with the
stream: either you're done and you don't need any new data, or you
need another block of bytes to do more work (and you know *which*
work), or you need to seek to a different position and you know what
you're going to do after that) and Enumerator : a thing that 'runs' an
iteratee, looking at what is is demanding, performing the
corresponding action, giving its result to the iteratee and seeing
what it wants next repeatedly.
Simplifying all the monadic stuff, we end up with something like this:
data Iteratee a = Done a
| NeedAnotherChunk (Maybe Chunk -> Iteratee a)
-- Will be given Nothing if we're at EOF
| NeedToSeek Int (Maybe Chunk -> Iteratee a)
type Enumerator a = Iteratee a -> Iteratee a
The key point is that you, by construction, *always* know whether you
need the stream or not.
Thus, since the data processing loop is concentrated in one place,
namely the particular enumerator, this loop *always* knows whether
it's time to close the stream or not. It is time, if the iteratee has
become a Done, or if the stream was closed or encountered an error.
Another key point is that both the iteratees and enumerators are
highly composable; and iteratees also form a monad, thus becoming
suitable for easily writing parsers.
Also, there's no recursion in the iteratees, and they are fusable and
thus extremely performant.
Further, you better read Oleg's article and code.
2009/3/2 Gü?nther Schmidt
Hi everyone,
after reading all the responses I would like to ask someone, anyone, to kind of summarize the merits of the left-fold-enumerator approach.
From all that I read so far about it all I was able to gather was that it has significance but I'm still not even sure what for and what not for.
Apparently Oleg has done various CS work, this particular piece just being one. But he also broaches the topic at very high level, ok, too high for me, ie. no CS or higher math background.
Would one of the super geeks please summarize it up? (In RWH kind of style if possible)
Günther
John Lato schrieb:
Hi Don,
Would you please elaborate on what features or capabilities you think are missing from left-fold that would elevate it out of the special purpose category? I think that the conception is so completely different from bytestrings that just saying it's not a bytestring equivalent doesn't give me any ideas as to what would make it more useful. Since the technique is being actively developed and researched, IMO this is a good time to be making changes.
Incidentally, in my package I've made newtypes that read data into strict bytestrings. It would be relatively simple to use unsafeInterleaveIO in an enumerator to create lazy bytestrings using this technique. I don't see why anyone would want to do so, however, since it would have all the negatives of lazy IO and be less efficient than simply using lazy bytestrings directly.
Cheers, John
On Sat, Feb 28, 2009 at 10:54 PM, Don Stewart
wrote: There are a few iteratee/enumerator design questions that remain, which Oleg and others would like to explore more fully. The results of that research will likely find there way into this library.
I agree. There's no left-fold 'bytestring' equivalent. So it remains a special purpose technique.
-- Don
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Eugene Kirpichov Web IR developer, market.yandex.ru

Eugene Kirpichov ha scritto:
OK, I'm far from being a supergeek, but anyways.
I'm not considering the lazy IO approach, as it doesn't involve any form of control over resources.
This is not always true. I'm using lazy IO, still having full control over the resources. parse path = withFile path ReadMode parse' where parse' :: Handle -> IO (UArr Xxx) parse' handle = do contents <- L.hGetContents handle let v = toU $ xxx $ L.lines contents rnf v `seq` return v All the file is consumed, before the result is returned.
[...]
Manlio Perillo

From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Manlio Perillo Sent: 02 March 2009 11:01
Eugene Kirpichov ha scritto:
I'm not considering the lazy IO approach, as it doesn't involve any form of control over resources.
This is not always true. I'm using lazy IO, still having full control over the resources.
parse path = withFile path ReadMode parse' where parse' :: Handle -> IO (UArr Xxx) parse' handle = do contents <- L.hGetContents handle let v = toU $ xxx $ L.lines contents rnf v `seq` return v
All the file is consumed, before the result is returned.
This only works if the entire file can reasonably fit into memory. If you want to process something really big, then you need some sort of streaming approach, where you only look at a small part of the file (a line, or a block) at a time. And this is where the enumerator-iteratee approach looks better, because the IO is strict, but you still take a stream-like approach to processing the contents. BTW, does this discussion remind anyone (else) of Peter Simon's Block-IO proposal? http://cryp.to/blockio/fast-io.html Alistair ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

On Mon, 2009-03-02 at 11:50 +0000, Bayley, Alistair wrote:
From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Manlio Perillo Sent: 02 March 2009 11:01
Eugene Kirpichov ha scritto:
I'm not considering the lazy IO approach, as it doesn't involve any form of control over resources.
This is not always true. I'm using lazy IO, still having full control over the resources.
parse path = withFile path ReadMode parse' where parse' :: Handle -> IO (UArr Xxx) parse' handle = do contents <- L.hGetContents handle let v = toU $ xxx $ L.lines contents rnf v `seq` return v
All the file is consumed, before the result is returned.
This only works if the entire file can reasonably fit into memory. If you want to process something really big, then you need some sort of streaming approach, where you only look at a small part of the file (a line, or a block) at a time. And this is where the enumerator-iteratee approach looks better, because the IO is strict, but you still take a stream-like approach to processing the contents.
This can still be done using withFile and hGetContents. You just have to put the consumer inside the scope of withFile. The consumer can work in a streaming fashion. With lazy bytestrings this can be both efficient, work in constant memory and guarantee the file is closed. We guarantee the file is closed by using withFile. The only thing to watch out for is a consumer that doesn't consume as much as you were expecting before the file does get closed. You should notice that pretty quickly though since it should happen every time (whereas resource leaks are not so immediately visible). Duncan

From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Duncan Coutts
This can still be done using withFile and hGetContents. You just have to put the consumer inside the scope of withFile. The consumer can work in a streaming fashion. With lazy bytestrings this can be both efficient, work in constant memory and guarantee the file is closed.
We guarantee the file is closed by using withFile. The only thing to watch out for is a consumer that doesn't consume as much as you were expecting before the file does get closed. You should notice that pretty quickly though since it should happen every time (whereas resource leaks are not so immediately visible).
Sure. But this case is the one that typically causes problems for beginners, who have not yet had the educating experience of being bitten by lazy IO. The standard café response to "why do I get <handle closed>" errors here?" is "you're using hGetContents and you haven't forced/consumed all of your file". Alistair ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

On Mon, 2009-03-02 at 12:50 +0000, Bayley, Alistair wrote:
From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Duncan Coutts
This can still be done using withFile and hGetContents. You just have to put the consumer inside the scope of withFile. The consumer can work in a streaming fashion. With lazy bytestrings this can be both efficient, work in constant memory and guarantee the file is closed.
We guarantee the file is closed by using withFile. The only thing to watch out for is a consumer that doesn't consume as much as you were expecting before the file does get closed. You should notice that pretty quickly though since it should happen every time (whereas resource leaks are not so immediately visible).
Sure. But this case is the one that typically causes problems for beginners, who have not yet had the educating experience of being bitten by lazy IO. The standard café response to "why do I get <handle closed>" errors here?" is "you're using hGetContents and you haven't forced/consumed all of your file".
This is quite true, but I expect that's easier to explain to beginners than iterator IO, at least at the present time. Duncan

Bayley, Alistair ha scritto:
From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Manlio Perillo Sent: 02 March 2009 11:01
Eugene Kirpichov ha scritto:
I'm not considering the lazy IO approach, as it doesn't involve any form of control over resources. This is not always true. I'm using lazy IO, still having full control over the resources.
parse path = withFile path ReadMode parse' where parse' :: Handle -> IO (UArr Xxx) parse' handle = do contents <- L.hGetContents handle let v = toU $ xxx $ L.lines contents rnf v `seq` return v
All the file is consumed, before the result is returned.
This only works if the entire file can reasonably fit into memory.
It's not the entire file, but only the parsed data structure.
If you want to process something really big, then you need some sort of streaming approach,
Yes, this is a more general solution.
[...]
Manlio Perillo

On 2009 Mar 2, at 7:40, Manlio Perillo wrote:
Bayley, Alistair ha scritto:
All the file is consumed, before the result is returned. This only works if the entire file can reasonably fit into memory.
It's not the entire file, but only the parsed data structure.
...which, depending on what you're doing with it, can be larger than the original file (consider lists). -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Hello, I am not a super-geek (at least, not compared to others on this list), but I'll take a try at this anyway. The benefits of iteratees mostly depend on differences between lazy and strict IO (see ch. 7 of Real World Haskell for more on this). Lazy IO allows for some seriously cool tricks. You can write functions that process entire files at once, without explicitly writing loops or recursion, and the runtime system will process data in chunks without attempting to load the whole file into memory. This allows for a very functional style of code, making it easy to abstract and compose operations. This benefit comes with a high cost. Lazy IO can make it much more difficult to manage resources. For example, suppose you open a Handle and use hGetContents to read from that handle into a data structure. Then you close the Handle. Later on, when you try to use the data, you get an error message about the handle being closed. Because the data structure wasn't fully evaluated before you closed the handle, some IO necessary to fill it was never performed, and now that the handle is closed it's too late. You don't have to explicitly close handles (when the result is garbage collected the handle should be closed automatically), but if you're doing a lot of IO this means that you'll have open handles accumulating until they can be GC'd, using up a finite resource. Lazy IO often makes it difficult to reason about the behavior of your code. One problem is unintentionally holding onto values, preventing them from being GC'd. Here's an example of a different sort of problem. Suppose you're writing a file, and the format specifies that you first write the size of data that follows, then the actual data. In order to obtain this value, you do something like the following: process = do data <- getFile "Infile.txt" writeFile "Outfile.txt" $ show $ length data You've now read the entire contents of Infile.txt into memory. If you have any code following that uses the string in "data", it won't get GC'd until after "data" is processed. If Infile.txt is large, this is a problem. You've lost one of the main benefits of lazy IO, that data is read only as necessary. Since the entire file is necessary to calculate the length of the string, the whole thing is forced into memory. In general, lazy IO doesn't have a good solution if you run into this issue. Strict IO avoids these problems, but only because the programmer has to manually manage the resources. Now the programmer is responsible for explicit recursion (typically in the IO monad), meaning the programmer needs to handle exceptions in the recursion. Buffering must also be managed manually. You've avoided the pitfalls of lazy IO, but now have a lot more work. Most of what you write will also be much less composable, leading to a lot of functions that do slight variations on the same thing. If you do strict IO long enough, you will soon desire a more generalized solution. First you'll make a function which will take a function parameter and recursively apply it to data read from a file (the enumerator). You'll want to add exception handling as well. Next, you'll find that you frequently don't need to read all of a file, only part of it. You'll want to modify your recursive function so that the function which is being repeatedly applied (the iteratee) can signal when it's finished, so the enumerator can stop processing the file at that point. After several more enhancements, you'll end up with what is essentially Oleg's iteratee code. At this point you have achieved the following: 1. Composability - both iteratees and enumerators can be composed. A number of basic iteratees are provided which can be combined by the user as necessary. 2. Layering - stream processing can be separated, e.g. one layer that creates lines, another to create words, etc. It is possible to make iteratees that operate on words, lines, or any other element that makes sense for the problem domain. 3. Efficiency - data is processed in chunks, and only as much IO is performed as necessary to evaluate the result. 4. Safety - IO exceptions don't leak outside the enumerator, and the file handle is closed after IO exceptions. 5. Generality - many different types of resources can be read (handles, files, sockets, etc.). In addition, they can easily be combined. One of Oleg's examples muxes input from two sockets into one stream in a process which is transparent to the iteratee. Furthermore, this works with any monad, not just IO (although IO actions must be performed in IO). 6. Seeking - seeking is allowed for seekable resources. So you've managed to achieve many of the benefits of lazy IO (composability, generality, and efficiency) without the drawbacks. Furthermore resources are managed strictly and released as soon as possible. I hope this is helpful. Cheers, John Lato
Message: 12 Date: Mon, 02 Mar 2009 01:31:02 +0100 From: G??nther Schmidt
Subject: [Haskell-cafe] Re: Left fold enumerator - a real pearl overlooked? To: haskell-cafe@haskell.org Message-ID: Content-Type: text/plain; charset=windows-1252; format=flowed Hi everyone,
after reading all the responses I would like to ask someone, anyone, to kind of summarize the merits of the left-fold-enumerator approach.
From all that I read so far about it all I was able to gather was that it has significance but I'm still not even sure what for and what not for.
Apparently Oleg has done various CS work, this particular piece just being one. But he also broaches the topic at very high level, ok, too high for me, ie. no CS or higher math background.
Would one of the super geeks please summarize it up? (In RWH kind of style if possible)
Günther
John Lato schrieb:
Hi Don,
Would you please elaborate on what features or capabilities you think are missing from left-fold that would elevate it out of the special purpose category? I think that the conception is so completely different from bytestrings that just saying it's not a bytestring equivalent doesn't give me any ideas as to what would make it more useful. Since the technique is being actively developed and researched, IMO this is a good time to be making changes.
Incidentally, in my package I've made newtypes that read data into strict bytestrings. It would be relatively simple to use unsafeInterleaveIO in an enumerator to create lazy bytestrings using this technique. I don't see why anyone would want to do so, however, since it would have all the negatives of lazy IO and be less efficient than simply using lazy bytestrings directly.
Cheers, John
On Sat, Feb 28, 2009 at 10:54 PM, Don Stewart
wrote: There are a few iteratee/enumerator design questions that remain, which Oleg and others would like to explore more fully. The results of that research will likely find there way into this library.
I agree. There's no left-fold 'bytestring' equivalent. So it remains a special purpose technique.
-- Don

On Mon, 2 Mar 2009, John Lato wrote:
Hello,
I am not a super-geek (at least, not compared to others on this list), but I'll take a try at this anyway. The benefits of iteratees mostly depend on differences between lazy and strict IO (see ch. 7 of Real World Haskell for more on this).
Maybe a good text for http://www.haskell.org/haskellwiki/Enumerator_and_iteratee ? While I think that the Iteratee pattern has benefits, I suspect that it can't be combined with regular lazy functions, e.g. of type [a] -> [a]. Say I have a chain of functions: read a file, parse it into a tag soup, parse that into an XML tree, transform that tree, format that into a string, write that to a file, and all of these functions are written in a lazy way, which is currently considered good style, I can't use them in conjunction with iteratees. This means, almost all Haskell libraries have to be rewritten or extended from lazy style to iteratee style. The question for me is then: Why having laziness in Haskell at all? Or at least, why having laziness by default, why not having laziness annotation instead of strictness annotation.

Lazy IO is a complete disaster for "interactive IO", such as network and process IO. Moreover, it's somewhat of a failure even for non- interactive IO such as the use case you described, because it's very easy for partial evaluation to lead to unclosed files and lazy evaluation to lead to delayed resource acquisition. I can imagine a few use cases that might benefit from it, but the evidence suggests that most developers trying to solve "real world" problems work extra hard to get their programs working properly with lazy IO. Elsewhere, laziness can be a real boon, so I don't understand your question, "Why have laziness in Haskell at all?" Regards, John A. De Goes N-BRAIN, Inc. The Evolution of Collaboration http://www.n-brain.net | 877-376-2724 x 101 On Mar 2, 2009, at 6:03 PM, Henning Thielemann wrote:
On Mon, 2 Mar 2009, John Lato wrote:
Hello,
I am not a super-geek (at least, not compared to others on this list), but I'll take a try at this anyway. The benefits of iteratees mostly depend on differences between lazy and strict IO (see ch. 7 of Real World Haskell for more on this).
Maybe a good text for http://www.haskell.org/haskellwiki/Enumerator_and_iteratee ?
[a]. Say I have a chain of functions: read a file, parse it into a tag soup, parse that into an XML tree, transform that tree, format
While I think that the Iteratee pattern has benefits, I suspect that it can't be combined with regular lazy functions, e.g. of type [a] - that into a string, write that to a file, and all of these functions are written in a lazy way, which is currently considered good style, I can't use them in conjunction with iteratees. This means, almost all Haskell libraries have to be rewritten or extended from lazy style to iteratee style. The question for me is then: Why having laziness in Haskell at all? Or at least, why having laziness by default, why not having laziness annotation instead of strictness annotation.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

John A. De Goes schrieb:
Elsewhere, laziness can be a real boon, so I don't understand your question, "Why have laziness in Haskell at all?"
As I have written, many libaries process their data lazily (or could be changed to do so without altering their interface) but their interface can forbid application to data that is fetched from the outside world. Say you are used to 'map', 'filter', 'foldr' - you cannot use them on data fetched by the iteratee/enumerator approach. Actually, Lazy I/O and exceptions can work together if you drop the exceptions that are baked into IO monad and use explicit exceptions (synchronous and asynchronous ones) as I develop them in the explicit-exception package. I'm however still searching for a good set of combinators.

On Tue, Mar 3, 2009 at 1:03 AM, Henning Thielemann
On Mon, 2 Mar 2009, John Lato wrote:
While I think that the Iteratee pattern has benefits, I suspect that it can't be combined with regular lazy functions, e.g. of type [a] -> [a]. Say I have a chain of functions: read a file, parse it into a tag soup, parse that into an XML tree, transform that tree, format that into a string, write that to a file, and all of these functions are written in a lazy way, which is currently considered good style, I can't use them in conjunction with iteratees. This means, almost all Haskell libraries have to be rewritten or extended from lazy style to iteratee style. The question for me is then: Why having laziness in Haskell at all? Or at least, why having laziness by default, why not having laziness annotation instead of strictness annotation.
I'm not sure that this is a problem, at least not for all cases. When reading seekable streams, it is possible to have IO on demand provided that all processing take place within the context of the Iteratee (see Oleg's Tiff reader, http://okmij.org/ftp/Haskell/Iteratee/Tiff.hs, and my wave reader, http://inmachina.net/~jwlato/haskell/iteratee/src/Data/Iteratee/Codecs/Wave....). Also, since the inner monad can be any monad, not just IO, you should be able to lift processing and computations into an iteratee in a fairly straightforward manner. File enumerators are only provided for IO, but it's fairly easy to create versions for other monads as necessary. I've got one for StateT s IO, for example. Now I do agree that this probably won't work in every case. I would suspect that parsers may have to be rewritten to use iteratees (although I don't know to what extent because I don't work with generic parsers). I'm not sure in what other cases this would also be true. The best way to figure it out would be to have more people using iteratees and reporting their findings. Cheers, John

jwlato:
On Tue, Mar 3, 2009 at 1:03 AM, Henning Thielemann
wrote: On Mon, 2 Mar 2009, John Lato wrote:
While I think that the Iteratee pattern has benefits, I suspect that it can't be combined with regular lazy functions, e.g. of type [a] -> [a]. Say I have a chain of functions: read a file, parse it into a tag soup, parse that into an XML tree, transform that tree, format that into a string, write that to a file, and all of these functions are written in a lazy way, which is currently considered good style, I can't use them in conjunction with iteratees. This means, almost all Haskell libraries have to be rewritten or extended from lazy style to iteratee style. The question for me is then: Why having laziness in Haskell at all? Or at least, why having laziness by default, why not having laziness annotation instead of strictness annotation.
I'm not sure that this is a problem, at least not for all cases. When reading seekable streams, it is possible to have IO on demand provided that all processing take place within the context of the Iteratee (see Oleg's Tiff reader, http://okmij.org/ftp/Haskell/Iteratee/Tiff.hs, and my wave reader, http://inmachina.net/~jwlato/haskell/iteratee/src/Data/Iteratee/Codecs/Wave....).
BTW, I've started (with his blessing) packaging up Oleg's Haskell code: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/liboleg-0.1.0.2 So you can use, e.g., the left-fold based TIFF parser: http://hackage.haskell.org/packages/archive/liboleg/0.1.0.2/doc/html/Codec-I... I'm walking backwards over his released modules, adding a few each day. Enjoy. -- Don
participants (12)
-
Bayley, Alistair
-
Brandon S. Allbery KF8NH
-
Don Stewart
-
Duncan Coutts
-
Eugene Kirpichov
-
G?uenther Schmidt
-
Gü?nther Schmidt
-
Henning Thielemann
-
Henning Thielemann
-
John A. De Goes
-
John Lato
-
Manlio Perillo