
Hello, I am not a super-geek (at least, not compared to others on this list), but I'll take a try at this anyway. The benefits of iteratees mostly depend on differences between lazy and strict IO (see ch. 7 of Real World Haskell for more on this). Lazy IO allows for some seriously cool tricks. You can write functions that process entire files at once, without explicitly writing loops or recursion, and the runtime system will process data in chunks without attempting to load the whole file into memory. This allows for a very functional style of code, making it easy to abstract and compose operations. This benefit comes with a high cost. Lazy IO can make it much more difficult to manage resources. For example, suppose you open a Handle and use hGetContents to read from that handle into a data structure. Then you close the Handle. Later on, when you try to use the data, you get an error message about the handle being closed. Because the data structure wasn't fully evaluated before you closed the handle, some IO necessary to fill it was never performed, and now that the handle is closed it's too late. You don't have to explicitly close handles (when the result is garbage collected the handle should be closed automatically), but if you're doing a lot of IO this means that you'll have open handles accumulating until they can be GC'd, using up a finite resource. Lazy IO often makes it difficult to reason about the behavior of your code. One problem is unintentionally holding onto values, preventing them from being GC'd. Here's an example of a different sort of problem. Suppose you're writing a file, and the format specifies that you first write the size of data that follows, then the actual data. In order to obtain this value, you do something like the following: process = do data <- getFile "Infile.txt" writeFile "Outfile.txt" $ show $ length data You've now read the entire contents of Infile.txt into memory. If you have any code following that uses the string in "data", it won't get GC'd until after "data" is processed. If Infile.txt is large, this is a problem. You've lost one of the main benefits of lazy IO, that data is read only as necessary. Since the entire file is necessary to calculate the length of the string, the whole thing is forced into memory. In general, lazy IO doesn't have a good solution if you run into this issue. Strict IO avoids these problems, but only because the programmer has to manually manage the resources. Now the programmer is responsible for explicit recursion (typically in the IO monad), meaning the programmer needs to handle exceptions in the recursion. Buffering must also be managed manually. You've avoided the pitfalls of lazy IO, but now have a lot more work. Most of what you write will also be much less composable, leading to a lot of functions that do slight variations on the same thing. If you do strict IO long enough, you will soon desire a more generalized solution. First you'll make a function which will take a function parameter and recursively apply it to data read from a file (the enumerator). You'll want to add exception handling as well. Next, you'll find that you frequently don't need to read all of a file, only part of it. You'll want to modify your recursive function so that the function which is being repeatedly applied (the iteratee) can signal when it's finished, so the enumerator can stop processing the file at that point. After several more enhancements, you'll end up with what is essentially Oleg's iteratee code. At this point you have achieved the following: 1. Composability - both iteratees and enumerators can be composed. A number of basic iteratees are provided which can be combined by the user as necessary. 2. Layering - stream processing can be separated, e.g. one layer that creates lines, another to create words, etc. It is possible to make iteratees that operate on words, lines, or any other element that makes sense for the problem domain. 3. Efficiency - data is processed in chunks, and only as much IO is performed as necessary to evaluate the result. 4. Safety - IO exceptions don't leak outside the enumerator, and the file handle is closed after IO exceptions. 5. Generality - many different types of resources can be read (handles, files, sockets, etc.). In addition, they can easily be combined. One of Oleg's examples muxes input from two sockets into one stream in a process which is transparent to the iteratee. Furthermore, this works with any monad, not just IO (although IO actions must be performed in IO). 6. Seeking - seeking is allowed for seekable resources. So you've managed to achieve many of the benefits of lazy IO (composability, generality, and efficiency) without the drawbacks. Furthermore resources are managed strictly and released as soon as possible. I hope this is helpful. Cheers, John Lato
Message: 12 Date: Mon, 02 Mar 2009 01:31:02 +0100 From: G??nther Schmidt
Subject: [Haskell-cafe] Re: Left fold enumerator - a real pearl overlooked? To: haskell-cafe@haskell.org Message-ID: Content-Type: text/plain; charset=windows-1252; format=flowed Hi everyone,
after reading all the responses I would like to ask someone, anyone, to kind of summarize the merits of the left-fold-enumerator approach.
From all that I read so far about it all I was able to gather was that it has significance but I'm still not even sure what for and what not for.
Apparently Oleg has done various CS work, this particular piece just being one. But he also broaches the topic at very high level, ok, too high for me, ie. no CS or higher math background.
Would one of the super geeks please summarize it up? (In RWH kind of style if possible)
Günther
John Lato schrieb:
Hi Don,
Would you please elaborate on what features or capabilities you think are missing from left-fold that would elevate it out of the special purpose category? I think that the conception is so completely different from bytestrings that just saying it's not a bytestring equivalent doesn't give me any ideas as to what would make it more useful. Since the technique is being actively developed and researched, IMO this is a good time to be making changes.
Incidentally, in my package I've made newtypes that read data into strict bytestrings. It would be relatively simple to use unsafeInterleaveIO in an enumerator to create lazy bytestrings using this technique. I don't see why anyone would want to do so, however, since it would have all the negatives of lazy IO and be less efficient than simply using lazy bytestrings directly.
Cheers, John
On Sat, Feb 28, 2009 at 10:54 PM, Don Stewart
wrote: There are a few iteratee/enumerator design questions that remain, which Oleg and others would like to explore more fully. The results of that research will likely find there way into this library.
I agree. There's no left-fold 'bytestring' equivalent. So it remains a special purpose technique.
-- Don