Fwd: Semantics of iteratees, enumerators, enumeratees?

For anyone interested in iteratees (etc) and not yet on the iteratees mailing list. I'm asking about what iteratees *mean* (denote), independent of the various implementations. My original note (also at the end below): With the encouragement & help of Conrad Parker, I've been looking at
iteratees, enumerators, enumeratees. I can find plenty written about them, but only about benefits and implementation. In sifting through chunks, error/control messages, and continuations, I find myself longing for a precise semantic basis. I keep wondering: what simpler & precise semantic notions do these mechanisms implement? Has anyone worked out a denotational semantics for iteratees, enumerators, enumeratees -- something that simplifies away the performance advantages & complexities? I've worked out something tentative, but perhaps I'm covering old ground.
- Conal
---------- Forwarded message ----------
From: Conal Elliott
Hi Conal,
I've always regarded your work in essentially the same category as Edward Kmett's (and most of Oleg's): stuff that's incredible powerful and concise, but I can't understand at all what it means. I've admired a lot of your work, particularly on Pan, FRP, and automatic differentiation, but most of the rest I couldn't understand at all.
I'll take a look at your *Denotational Design* paper again; maybe now that I have a lot more experience I'll be able to make sense of it.
John
On Sun, Aug 22, 2010 at 8:18 AM, Conal Elliott
wrote: Hi John,
Thanks for the reply. A denotational semantics would be independent of any implementation, so it would apply to any of them, as long as they have the same programming interface. The purpose is to simply & precisely say what the types and their building blocks (API) mean by providing a precise, implementation-independent, and simple-as-possible math model. Such a semantics can be used to prove properties and to define correctness of any implementation. It also gives clear feedback on how elegant or inelegant a library design is.
For instance, given a type, Map k v, of finite maps, we might say the meaning is the type of partial functions from k to v, either k -> v (where absent is _|_) or k -> Maybe v (where absent is Nothing). Then we'd give the meaning of each Map operation as a function of the meanings of its arguments. This example and several others are given in the paper *Denotational design with type class morphismshttp://conal.net/papers/type-class-morphisms/ *.
Regards, - Conal
On Sun, Aug 22, 2010 at 8:31 PM, John Lato
wrote: Hi Conal,
To my knowledge, nobody has attempted this. Oleg may have some ideas, but I don't think he's written about it. I really don't know anything about denotational semantics, so I couldn't do this myself. For some time I've thought it would be good if somebody were able to put together a formal semantics for iteratees, so I'd be very interested if you'd share what you have so far.
Would a denotational semantics apply equally to multiple implementations, or would it be tied to a specific implementation?
John
On Sun, Aug 22, 2010 at 3:47 AM, Conal Elliott
wrote: With the encouragement & help of Conrad Parker, I've been looking at iteratees, enumerators, enumeratees. I can find plenty written about them, but only about benefits and implementation. In sifting through chunks, error/control messages, and continuations, I find myself longing for a precise semantic basis. I keep wondering: what simpler & precise semantic notions do these mechanisms implement? Has anyone worked out a denotational semantics for iteratees, enumerators, enumeratees -- something that simplifies away the performance advantages & complexities? I've worked out something tentative, but perhaps I'm covering old ground.
- Conal
_______________________________________________ Iteratee mailing list Iteratee@projects.haskell.org http://projects.haskell.org/cgi-bin/mailman/listinfo/iteratee

Conal Elliott wrote:
For anyone interested in iteratees (etc) and not yet on the iteratees mailing list.
I'm asking about what iteratees *mean* (denote), independent of the various implementations. My original note (also at the end below):
In my world view, iteratees are just a monad M with a single operation symbol :: M Char that reads the next symbol from an input stream. In other words, they're a very simple parser monad. The emphasis is not on parsing, but on the fact that one and the same monadic value can be run on different streams runHandle :: M a -> Handle -> IO a runString :: M a -> String -> a runByteString :: M a -> ByteString -> a The monad M may also include convenience like exceptions and liftIO . I have omitted the chunking [Char] because I don't like it; invariance with respect to the chunk sizes is something that should be left to the iteratee abstraction. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On Mon, Aug 23, 2010 at 1:06 AM, Heinrich Apfelmus
Conal Elliott wrote:
For anyone interested in iteratees (etc) and not yet on the iteratees mailing list.
I'm asking about what iteratees *mean* (denote), independent of the various implementations. My original note (also at the end below):
In my world view, iteratees are just a monad M with a single operation
symbol :: M Char
that reads the next symbol from an input stream.
So perhaps this could be a reasonable semantics? Iteratee a = [Char] -> Maybe (a, [Char]) = MaybeT (State [Char]) a symbol [] = Nothing symbol (c:cs) = Just (c, cs) I'm not experienced with iteratees. Does this miss something? Luke

Luke Palmer wrote:
Heinrich Apfelmus wrote:
Conal Elliott wrote:
For anyone interested in iteratees (etc) and not yet on the iteratees mailing list.
I'm asking about what iteratees *mean* (denote), independent of the various implementations.
In my world view, iteratees are just a monad M with a single operation
symbol :: M Char
that reads the next symbol from an input stream.
So perhaps this could be a reasonable semantics?
Iteratee a = [Char] -> Maybe (a, [Char]) = MaybeT (State [Char]) a
symbol [] = Nothing symbol (c:cs) = Just (c, cs)
I'm not experienced with iteratees. Does this miss something?
From a purely denotational point of view, that's a reasonable semantics. However, and that's the main point, with this particular semantics, it is impossible to implement runHandle :: M a -> Handle -> IO a without using unsafeInterleaveIO . Typical implementations of iteratees do make that possible, by being able to suspend the iteratee after feeding it a character. There are also enumerators and enumeratees. I think that purpose of enumerator = run an iteratee on multiple sources (i.e. first part of the input from a Handle , second part from a String ) purpose of enumeratee = iteratee as a stream transformer, i.e. as a map [x] -> [y] I am not sure whether this elaborate reinvention of the standard lists functions is worth the trouble. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On Mon, 23 Aug 2010 14:38:29 +0200, Heinrich Apfelmus
Luke Palmer wrote:
Heinrich Apfelmus wrote:
Conal Elliott wrote:
For anyone interested in iteratees (etc) and not yet on the iteratees mailing list.
I'm asking about what iteratees *mean* (denote), independent of the various implementations.
In my world view, iteratees are just a monad M with a single operation
symbol :: M Char
that reads the next symbol from an input stream.
So perhaps this could be a reasonable semantics?
Iteratee a = [Char] -> Maybe (a, [Char]) = MaybeT (State [Char]) a
symbol [] = Nothing symbol (c:cs) = Just (c, cs)
I'm not experienced with iteratees. Does this miss something?
From a purely denotational point of view, that's a reasonable semantics.
However, and that's the main point, with this particular semantics, it is impossible to implement
runHandle :: M a -> Handle -> IO a
without using unsafeInterleaveIO . Typical implementations of iteratees do make that possible, by being able to suspend the iteratee after feeding it a character.
There are also enumerators and enumeratees. I think that
purpose of enumerator = run an iteratee on multiple sources (i.e. first part of the input from a Handle , second part from a String )
I would say more simply that an enumerator is a data-producer (or source). Although it is a producer defined as a consummer (or sink) feeder. An iteratee is thus the consummer. It is defined as an action asking either for more food or producing a value and a food left over -- ignoring errors and over-simplifing Stream as Maybe data Step a b = Continue (Maybe a -> IO (Step a b)) | Yield b (Maybe a) type Iteratee a b = IO (Step a b) -- the most important case is when getting Continue as input: -- type Enumerator a b = (Maybe a -> IO (Step a b)) -> IO (Step a b) type Enumerator a b = Step a b -> IO (Step a b) Note that I'm far from an expert on Iteratee but I start to get some intuitions out of it. -- Nicolas Pouillard http://nicolaspouillard.fr

Nicolas Pouillard wrote:
Heinrich Apfelmus wrote:
There are also enumerators and enumeratees. I think that
purpose of enumerator = run an iteratee on multiple sources (i.e. first part of the input from a Handle , second part from a String )
I would say more simply that an enumerator is a data-producer (or source). Although it is a producer defined as a consumer (or sink) feeder.
Sure, but then why not define them as type Enumerator a b = Iteratee a b -> IO b ? After all, I imagine a data producer to feed an Iteratee with tokens until it has run to completion. The reason for the definition type Enumerator a b = Iteratee a b -> IO (Iteratee a b) is that you can now concatenate different input sources. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

So perhaps this could be a reasonable semantics?
Iteratee a = [Char] -> Maybe (a, [Char])
I've been tinkering with this model as well. However, it doesn't really correspond to the iteratee interfaces I've seen, since those interfaces allow an iteratee to notice size and number of chunks. I suspect this ability is an accidental abstraction leak, which raises the question of how to patch the leak. What about enumerators? The definition given in Oleg's presentation ( http://okmij.org/ftp/Streams.html#iteratee, slide 21) is
type Enumerator a = Iteratee a -> Iteratee a
Since we have a semantics for Iteratee, we could take this Enumerator definition as is, and we'd have a semantics, i.e.,
[[Enumerator a]] = [[Iteratee a]] -> [[Iteratee a]]
I don't trust this choice, however. It could be that, like the Iteratee representation, the Enumerator representation (as a function) is more an *implementation* than a semantics. That is, like Iteratee, * there might be a simpler and more natural semantic model; and * the representation may be junky, i.e., having many representations that we wouldn't want to be denotable. Is there a simpler model of Enumerator? My intuition is that it's simply a stream:
[[Enumerator a]] = String
Oddly, 'a' doesn't show up on the RHS. Maybe the representation ought to be
type Enumerator = forall a. Iteratee a -> Iteratee a
so
[[Enumerator]] = String
Are there any enumerator definitions that couldn't use this more restrictive
representation type? Glancing through the slides, the only Enumerator types
I see are indeed polymorphic over a (the iteratee's result type.)
Again, there's a terrible abstraction leak here, i.e., many ways to write
down enumerators that type-check but are not meaningful within the model.
Can this leak be fixed?
Comments?
- Conal
On Mon, Aug 23, 2010 at 8:13 PM, Luke Palmer
On Mon, Aug 23, 2010 at 1:06 AM, Heinrich Apfelmus
wrote: Conal Elliott wrote:
For anyone interested in iteratees (etc) and not yet on the iteratees mailing list.
I'm asking about what iteratees *mean* (denote), independent of the various implementations. My original note (also at the end below):
In my world view, iteratees are just a monad M with a single operation
symbol :: M Char
that reads the next symbol from an input stream.
So perhaps this could be a reasonable semantics?
Iteratee a = [Char] -> Maybe (a, [Char]) = MaybeT (State [Char]) a
symbol [] = Nothing symbol (c:cs) = Just (c, cs)
I'm not experienced with iteratees. Does this miss something?
Luke _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

I'm not a semanticist, so I apologize right now if I say something stupid or
incorrect.
On Mon, Aug 23, 2010 at 9:57 PM, Conal Elliott
So perhaps this could be a reasonable semantics?
Iteratee a = [Char] -> Maybe (a, [Char])
I've been tinkering with this model as well.
However, it doesn't really correspond to the iteratee interfaces I've seen, since those interfaces allow an iteratee to notice size and number of chunks. I suspect this ability is an accidental abstraction leak, which raises the question of how to patch the leak.
From a purely practical viewpoint I feel that treating the chunking as an abstraction leak might be missing the point. If you said, you wanted the semantics to acknowledge the chunking but be invariant under the size or number of the chunks then I would be happier.
I use iteratees when I need to be explicit about chunking and when I don't want the resources to "leak outside" of the stream processing. If you took those properties away, I wouldn't want to use it anymore because then it would just be an inelegant way to do things. I won't comment further in this email because I think I lack the formal training to follow the rest of your discussion. And that is unfortunate for me. Thanks, Jason

On 24 August 2010 14:14, Jason Dagit
I'm not a semanticist, so I apologize right now if I say something stupid or incorrect.
On Mon, Aug 23, 2010 at 9:57 PM, Conal Elliott
wrote: So perhaps this could be a reasonable semantics?
Iteratee a = [Char] -> Maybe (a, [Char])
I've been tinkering with this model as well.
However, it doesn't really correspond to the iteratee interfaces I've seen, since those interfaces allow an iteratee to notice size and number of chunks. I suspect this ability is an accidental abstraction leak, which raises the question of how to patch the leak.
From a purely practical viewpoint I feel that treating the chunking as an abstraction leak might be missing the point. If you said, you wanted the semantics to acknowledge the chunking but be invariant under the size or number of the chunks then I would be happier.
I think that's the point, ie. to specify what the invariants should be. For example (to paraphrase, very poorly, something Conal wrote on the whiteboard behind me): run [concat [chunk]] == run [chunk] ie. the (a, [Char]) you maybe get from running an iteratee over any partitioning of chunks should be the same, ie. the same as from running it over the concatenation of all chunks, which is the whole input [Char].
I use iteratees when I need to be explicit about chunking and when I don't want the resources to "leak outside" of the stream processing. If you took those properties away, I wouldn't want to use it anymore because then it would just be an inelegant way to do things.
Then I suppose the model for Enumerators is different than that for Iteratees; part of the point of an Enumerator is to control the size of the chunks, so that needs to be part of the model. An Iteratee, on the other hand, should not have to know the size of its chunks. So you don't want to be able to know the length of a chunk (ie. a part of the stream), but you do want to be able to, say, fold over it, and to be able to stop the computation at any time (these being the main point of iteratees ...). Conrad.

On Mon, Aug 23, 2010 at 10:37 PM, Conrad Parker
On 24 August 2010 14:14, Jason Dagit
wrote: I'm not a semanticist, so I apologize right now if I say something stupid or incorrect.
On Mon, Aug 23, 2010 at 9:57 PM, Conal Elliott
wrote: So perhaps this could be a reasonable semantics?
Iteratee a = [Char] -> Maybe (a, [Char])
I've been tinkering with this model as well.
However, it doesn't really correspond to the iteratee interfaces I've seen, since those interfaces allow an iteratee to notice size and number of chunks. I suspect this ability is an accidental abstraction leak, which raises the question of how to patch the leak.
From a purely practical viewpoint I feel that treating the chunking as an abstraction leak might be missing the point. If you said, you wanted the semantics to acknowledge the chunking but be invariant under the size or number of the chunks then I would be happier.
I think that's the point, ie. to specify what the invariants should be. For example (to paraphrase, very poorly, something Conal wrote on the whiteboard behind me):
run [concat [chunk]] == run [chunk]
ie. the (a, [Char]) you maybe get from running an iteratee over any partitioning of chunks should be the same, ie. the same as from running it over the concatenation of all chunks, which is the whole input [Char].
I find this notation foreign. I get [Char], that's the Haskell String type, but what is [chunk]? I doubt you mean a list of one element.
I use iteratees when I need to be explicit about chunking and when I don't want the resources to "leak outside" of the stream processing. If you took those properties away, I wouldn't want to use it anymore because then it would just be an inelegant way to do things.
Then I suppose the model for Enumerators is different than that for Iteratees; part of the point of an Enumerator is to control the size of the chunks, so that needs to be part of the model. An Iteratee, on the other hand, should not have to know the size of its chunks. So you don't want to be able to know the length of a chunk (ie. a part of the stream), but you do want to be able to, say, fold over it, and to be able to stop the computation at any time (these being the main point of iteratees ...).
I think I agree with that. Jason

On 24 August 2010 14:47, Jason Dagit
On Mon, Aug 23, 2010 at 10:37 PM, Conrad Parker
wrote: On 24 August 2010 14:14, Jason Dagit
wrote: I'm not a semanticist, so I apologize right now if I say something stupid or incorrect.
On Mon, Aug 23, 2010 at 9:57 PM, Conal Elliott
wrote: So perhaps this could be a reasonable semantics?
Iteratee a = [Char] -> Maybe (a, [Char])
I've been tinkering with this model as well.
However, it doesn't really correspond to the iteratee interfaces I've seen, since those interfaces allow an iteratee to notice size and number of chunks. I suspect this ability is an accidental abstraction leak, which raises the question of how to patch the leak.
From a purely practical viewpoint I feel that treating the chunking as an abstraction leak might be missing the point. If you said, you wanted the semantics to acknowledge the chunking but be invariant under the size or number of the chunks then I would be happier.
I think that's the point, ie. to specify what the invariants should be. For example (to paraphrase, very poorly, something Conal wrote on the whiteboard behind me):
run [concat [chunk]] == run [chunk]
ie. the (a, [Char]) you maybe get from running an iteratee over any partitioning of chunks should be the same, ie. the same as from running it over the concatenation of all chunks, which is the whole input [Char].
I find this notation foreign. I get [Char], that's the Haskell String type, but what is [chunk]? I doubt you mean a list of one element.
sorry, that was just my way of writing "the list of chunks" or perhaps "the stream of chunks that represents the input". Conrad.
I use iteratees when I need to be explicit about chunking and when I don't want the resources to "leak outside" of the stream processing. If you took those properties away, I wouldn't want to use it anymore because then it would just be an inelegant way to do things.
Then I suppose the model for Enumerators is different than that for Iteratees; part of the point of an Enumerator is to control the size of the chunks, so that needs to be part of the model. An Iteratee, on the other hand, should not have to know the size of its chunks. So you don't want to be able to know the length of a chunk (ie. a part of the stream), but you do want to be able to, say, fold over it, and to be able to stop the computation at any time (these being the main point of iteratees ...).
I think I agree with that. Jason

Here's a way I've been tinkering with to think about iteratees clearly. For simplicity, I'll stick with pure, error-free iteratees for now, and take chunks to be strings. Define a function that runs the iteratee:
runIter :: Iteratee a -> [String] -> (a, [String])
Note that chunking is explicit here. Next, a relation that an iteratee implements a given specification, defined by a state transformer:
sat :: Iteratee a -> State String a -> Bool
Define sat in terms of concatenating chunks:
sat it st = second concat . runIter it == runState st . second concat
where the RHS equality is between functions (pointwise/extensionally), and runState uses the representation of State directly
runState :: State s a -> s -> (a,s)
(I think this sat definition is what Conrad was alluding to.)
Now use sat to specify and verify operations on iteratees and to
*synthesize* those operations from their specifications. Some iteratees
might not satisfy *any* (State-based) specification. For instance, an
iteratee could look at the lengths or number of its chunks and produce
results accordingly. I think of such iteratees as abstraction leaks. Can
the iteratee vocabulary be honed to make only well-behaved (specifiable)
iteratees possible to express? If so, can we preserve performance benefits?
If indeed the abstraction leaks can be fixed, I expect there will be a
simpler & more conventional semantics than sat above.
- Conal
On Tue, Aug 24, 2010 at 2:55 PM, Conrad Parker
On 24 August 2010 14:47, Jason Dagit
wrote: On Mon, Aug 23, 2010 at 10:37 PM, Conrad Parker
wrote: On 24 August 2010 14:14, Jason Dagit
wrote: I'm not a semanticist, so I apologize right now if I say something stupid or incorrect.
On Mon, Aug 23, 2010 at 9:57 PM, Conal Elliott
wrote:
So perhaps this could be a reasonable semantics?
Iteratee a = [Char] -> Maybe (a, [Char])
I've been tinkering with this model as well.
However, it doesn't really correspond to the iteratee interfaces I've seen, since those interfaces allow an iteratee to notice size and number of chunks. I suspect this ability is an accidental abstraction leak, which raises the question of how to patch the leak.
From a purely practical viewpoint I feel that treating the chunking as an abstraction leak might be missing the point. If you said, you wanted the semantics to acknowledge the chunking but be invariant under the size or number of the chunks then I would be happier.
I think that's the point, ie. to specify what the invariants should be. For example (to paraphrase, very poorly, something Conal wrote on the whiteboard behind me):
run [concat [chunk]] == run [chunk]
ie. the (a, [Char]) you maybe get from running an iteratee over any partitioning of chunks should be the same, ie. the same as from running it over the concatenation of all chunks, which is the whole input [Char].
I find this notation foreign. I get [Char], that's the Haskell String type, but what is [chunk]? I doubt you mean a list of one element.
sorry, that was just my way of writing "the list of chunks" or perhaps "the stream of chunks that represents the input".
Conrad.
I use iteratees when I need to be explicit about chunking and when I don't want the resources to "leak outside" of the stream processing. If you took those properties away, I wouldn't want to use it anymore because then
it
would just be an inelegant way to do things.
Then I suppose the model for Enumerators is different than that for Iteratees; part of the point of an Enumerator is to control the size of the chunks, so that needs to be part of the model. An Iteratee, on the other hand, should not have to know the size of its chunks. So you don't want to be able to know the length of a chunk (ie. a part of the stream), but you do want to be able to, say, fold over it, and to be able to stop the computation at any time (these being the main point of iteratees ...).
I think I agree with that. Jason

Jason Dagit wrote:
From a purely practical viewpoint I feel that treating the chunking as an abstraction leak might be missing the point. If you said, you wanted the semantics to acknowledge the chunking but be invariant under the size or number of the chunks then I would be happier.
I use iteratees when I need to be explicit about chunking and when I don't want the resources to "leak outside" of the stream processing. If you took those properties away, I wouldn't want to use it anymore because then it would just be an inelegant way to do things.
I'm curious, can you give an example where you want to be explicit about chunking? I have a hard time imagining an example where chunking is beneficial compared to getting each character in sequence. Chunking seems to be common in C for reasons of performance, but how does that apply to Haskell? On the matter of leaking resources outside the stream processing, Iteratee does not give you any guarantees, it's only a stylistic aid (which can be powerful, of course). For instance, the following Iteratee returns the whole stream as a list: getStream :: Iteratee e a m [a] getStream = Iteratee . return . Continue $ go [] where go xs EOF = Yield xs EOF go xs (Chunk ys) = Continue $ go (xs++ys) (using the API from http://ianen.org/articles/understanding-iteratees/ ) Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On Tue, Aug 24, 2010 at 12:49 AM, Heinrich Apfelmus < apfelmus@quantentunnel.de> wrote:
Jason Dagit wrote:
From a purely practical viewpoint I feel that treating the chunking as an abstraction leak might be missing the point. If you said, you wanted the semantics to acknowledge the chunking but be invariant under the size or number of the chunks then I would be happier.
I use iteratees when I need to be explicit about chunking and when I don't want the resources to "leak outside" of the stream processing. If you took those properties away, I wouldn't want to use it anymore because then it would just be an inelegant way to do things.
I'm curious, can you give an example where you want to be explicit about chunking? I have a hard time imagining an example where chunking is beneficial compared to getting each character in sequence. Chunking seems to be common in C for reasons of performance, but how does that apply to Haskell?
It applies to Haskell for the same reasons, as far as I can tell. You want it to manage performance characteristics. See my example below. If you wrote it using chunking you wouldn't need lazy io (which is argued quite well in other place to be bad and I assume you've read the arguments and more or less agree). Furthermore, wouldn't iteratees force you to implement something equivalent to either option #1 or #2, but #3 wouldn't be possible? I think it basically comes down to this: We replace lazy io with explicit chunking because lazy io is unsafe, but explicit chunking can be safe. So, if you had a lazy pure generator you wouldn't need chunking, although perhaps the iteratee style would help avoid accidental space leaks that happen from referencing the stream elements outside of the fold (like #3 below).
On the matter of leaking resources outside the stream processing, Iteratee does not give you any guarantees, it's only a stylistic aid (which can be powerful, of course). For instance, the following Iteratee returns the whole stream as a list:
I think your example is fine. I consider it a misbehaving iteratee, in the same way that returning any large structure would be misbehaving in this context. I think, if the iteratee returns something large that's different than letting things "leak out". It's like a difference of scope. A well-behaved iteratee will reduce the input to reasonable return value. What would be bad, is if other bits of code could reference parts of the stream, while the iteratee is looking at it, and hold on to it. That would cause a space leak. An example of this bad behavior, would be to use readFile to read a file. Then compute two things: a) sum of the bytes in the file as Int32, b) length (in number of characters) of the file. Supposing we use lazy io (Prelude.readFile): 1) read the file, compute (a), close the file, read the file, compute (b), and finally close the file. You can do so in constant space. 2) read the file, use one pass to calculate both (a) and (b) at the same time, then close the file. You can do so in constant space. 3) read the file, use one pass to compute (a) followed by a pass to compute (b), then close the file. The space used will be O(filesize). I consider option #3 to be letting the elements of the stream "leak out". The computation in (b) references them and thus the garbage collector doesn't free them between (a) and (b), and the optimizer cannot fuse (a) and (b) in all cases. There is a fourth option, and that is to use strict io but then each of the above takes space O(filesize). I hope that makes sense. It's getting late here and I could be talking non-sense, but I have tried the above 3 cases in the past and as best as I can recall those were my findings. Thanks, Jason

Jason Dagit wrote:
Heinrich Apfelmus wrote:
I'm curious, can you give an example where you want to be explicit about chunking? I have a hard time imagining an example where chunking is beneficial compared to getting each character in sequence. Chunking seems to be common in C for reasons of performance, but how does that apply to Haskell?
[...] I think it basically comes down to this: We replace lazy io with explicit chunking because lazy io is unsafe, but explicit chunking can be safe.
Ah, I mean to compare Iteratees with chunking to Iteratees with single character access, not to lazy IO. In C, this would be a comparison between read and getchar . If I remember correctly, the former is faster for copying a file simply because copying one character at a time with getchar is too granular (you have to make an expensive system call every time). Of course, this reasoning only applies to C and not necessarily to Haskell. Do you have an example where you want chunking instead of single character access?
Supposing we use lazy io (Prelude.readFile): 1) read the file, compute (a), close the file, read the file, compute (b), and finally close the file. You can do so in constant space. 2) read the file, use one pass to calculate both (a) and (b) at the same time, then close the file. You can do so in constant space. 3) read the file, use one pass to compute (a) followed by a pass to compute (b), then close the file. The space used will be O(filesize).
I consider option #3 to be letting the elements of the stream "leak out". The computation in (b) references them and thus the garbage collector doesn't free them between (a) and (b), and the optimizer cannot fuse (a) and (b) in all cases.
Indeed, Iteratees make it difficult to express option #3, hence discouraging this particular space leak. Compared to lazy IO, they also make sure that the file handle is closed properly and does not leak. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

Conal Elliott wrote:
Is there a simpler model of Enumerator? My intuition is that it's simply a stream:
[[Enumerator a]] = String
Oddly, 'a' doesn't show up on the RHS. Maybe the representation ought to be
type Enumerator = forall a. Iteratee a -> Iteratee a
so
[[Enumerator]] = String
I concur, that seems to be all there is to it. There is a small nuance in the Iteratee implementation, namely: if an Enumerator is something that provides a complete input stream to an Iteratee, why isn't it simply defined as type Enumerator = forall a. Iteratee a -> a i.e. as a function that runs an Iteratee on an input stream and extracts the result? I think the purpose of the implementation type Enumerator = forall a. Iteratee a -> Iteratee a is that it allows us to concatenate different input streams. In other words fromString (xs ++ ys) = fromString ys . fromString xs assuming a function fromString :: String -> Enumerator To get an actual result from an Iteratee, we only need a way to run it on the empty stream. runOnEmptyString :: Iteratee a -> Maybe a Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

I have omitted the chunking [Char] because I don't like it; invariance with respect to the chunk sizes is something that should be left to the iteratee abstraction.
I have this same reservation about iteratees. And related one for enumerators and enumeratees. Assuming my sense of their intended meanings is on track, they allow lots of well-typed but bogus values. Defining and adhering to a precise denotational model would eliminate all of these abstraction leaks, as Luke Palmer alludes to in http://lukepalmer.wordpress.com/2008/07/18/semantic-design/ . - Conal On Mon, Aug 23, 2010 at 4:06 PM, Heinrich Apfelmus < apfelmus@quantentunnel.de> wrote:
Conal Elliott wrote:
For anyone interested in iteratees (etc) and not yet on the iteratees mailing list.
I'm asking about what iteratees *mean* (denote), independent of the various implementations. My original note (also at the end below):
In my world view, iteratees are just a monad M with a single operation
symbol :: M Char
that reads the next symbol from an input stream. In other words, they're a very simple parser monad. The emphasis is not on parsing, but on the fact that one and the same monadic value can be run on different streams
runHandle :: M a -> Handle -> IO a runString :: M a -> String -> a runByteString :: M a -> ByteString -> a
The monad M may also include convenience like exceptions and liftIO .
I have omitted the chunking [Char] because I don't like it; invariance with respect to the chunk sizes is something that should be left to the iteratee abstraction.
Regards, Heinrich Apfelmus
-- http://apfelmus.nfshost.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Conal Elliott wrote:
For anyone interested in iteratees (etc) and not yet on the iteratees mailing list.
I'm asking about what iteratees *mean* (denote), independent of the various implementations. My original note (also at the end below):
With the encouragement & help of Conrad Parker, I've been looking at iteratees, enumerators, enumeratees. I can find plenty written about them, but only about benefits and implementation. In sifting through chunks, error/control messages, and continuations, I find myself longing for a precise semantic basis. I keep wondering: what simpler & precise semantic notions do these mechanisms implement? Has anyone worked out a denotational semantics for iteratees, enumerators, enumeratees -- something that simplifies away the performance advantages & complexities? I've worked out something tentative, but perhaps I'm covering old ground.
I believe the denotation of an iteratee is the transition function for an automaton (or rather a transducer). I hesitate to speculate on the specific kind of automaton without thinking about it, so maybe finite, maybe deterministic, but then again maybe not. The core idea of iteratees vs conventional parsing strikes me as the same as the build/foldr vs unfoldr/destroy dichotomy. That is, ultimately we have a non-recursive producer, a non-recursive consumer, and a recursive driver. In build/foldr the producer is "flat" and we factor the recursion into the consumer; whereas in unfoldr/destroy we factor the recursion into the producer and the consumer is flat. Thus, I think iteratees are just the (non-recursive) transition function. The recursion for applying the transition function is done elsewhere, namely in the data/driver. Whereas in conventional parsing, the parser contains both the transition function and the recursion for driving the automaton until it hits an accepting/error state, and the data is just a flat stream. This is why conventional parsers don't have a Partial/More constructor: they don't expose the intermediate states of the automaton. Since iteratees only take a single step before returning, they do expose those intermediate states and so they need to have a constructor for them. -- Live well, ~wren

On Mon, Aug 23, 2010 at 11:41 PM, wren ng thornton
I believe the denotation of an iteratee is the transition function for an automaton (or rather a transducer). I hesitate to speculate on the specific kind of automaton without thinking about it, so maybe finite, maybe deterministic, but then again maybe not.
An iteratee is indeed an automaton, specifically one in an unknown (but non-terminal) state. Consider the types in the "iteratee" package: newtype IterateeG c el m a = IterateeG (StreamG c el -> m (IterGV c el m a)) data IterGV c el m a = Done a (StreamG c el) | Cont (IterateeG c el m a) (Maybe ErrMsg) data StreamG c el = EOF (Maybe ErrMsg) | Chunk (c el) Abbreviating a bit and inlining the auxiliary data types: Iteratee c e m a = Iteratee ( Either (Maybe ErrMsg) (c e) -> m ( Either (a, Either (Maybe ErrMsg) (c e)) (Iteratee c e m a, Maybe ErrMsg) ) ) Although the "stream" bits--which actually represent a single chunk of input--are self-contained, so it might clarify things to parameterize the iteratee over the entire chunk type, subsuming the "c" and "e" parameters: Iteratee s m a = Iteratee (s -> m (Either (a, s) (Iteratee s m a, Maybe ErrMsg))) In practice you wouldn't want to do that because you want the "c" and "e" parameters to be readily available. Perhaps a type family would make more sense here for the type now called "s"? There's also the matter of errors in the input stream, but that doesn't really impact the underlying structure in a significant way. We have a type parameter "m :: * -> *", which sounds suspiciously like an intended monad. It's wrapping an Either value, which amounts to just EitherT, like the one in category-extras. Iteratee s m a = Iteratee (s -> EitherT (a, s) m (Iteratee s m a, Maybe ErrMsg)) A function to a monadic value is just a Kleisli arrow. Iteratee s m a = Iteratee (Kleisli (EitherT (a, s) m)) s (Iteratee s m a, Maybe ErrMsg) Which sets things up to use the Automaton transformer in the "arrows" package. Iteratee s m a = Automaton (Kleisli (EitherT (a, s) m)) s (Maybe ErrMsg) The Automaton type describes a Mealy-style stream transducer where the underlying arrow combines the transition function and state, and the input and output to the arrow are the per-step input and output of the automaton. The iteratee automaton here produces only a stream of (Maybe ErrMsg) as output, so it really isn't much of a transducer. EitherT describes a computation that can be cut short, which in this case essentially augments the automaton with an explicit "halt" state. So, we have: - An Iteratee describes a running state machine, paused at an outgoing transition, awaiting another chunk of input. - After receiving input, the Iteratee does one of two things: - Halt, returning unused input and a final result value. - Return an action in the underlying monad, containing the post-transition state machine and an optional error message. - The Iteratee type is parameterized by three types: a single chunk of input, an underlying monad, and a final result value; generic iteratee functions are thus independent of any of those. The "Enumerator" types are the other half of the system: an arbitrary data source that sits there and turns the crank, feeding in chunks of data, until it decides to stop. The types in the "enumerator" package follow almost the same scheme, but with things rotated around a little bit: What it calls Iteratee is a monadic action, representing a state machine paused at an ingoing transition, which will yield either an outgoing transition function, a halting state with a final result, or an error. What sets an iteratee-style design apart from something conventional based on a State monad is that the iteratee conceals its internal state completely (in fact, there's no reason an iteratee even has to be the "same" function step-to-step, or have a single consistent "state" type--almost has an existential flavor, really), but is at another function's mercy when it comes to actually doing anything. All of which doesn't really shed too much light on the denotation of these things, I suppose, as there's barely anything there to talk about; the iteratee automaton itself is a terribly simple construct, relying on an underlying monad to perform actions, on an external "push" data source to recurse, and being given only bite-size chunks of data at each step. It's little more than foldl with a "pause" button attached. - C.

On 8/24/10 3:54 AM, C. McCann wrote:
What sets an iteratee-style design apart from something conventional based on a State monad is that the iteratee conceals its internal state completely (in fact, there's no reason an iteratee even has to be the "same" function step-to-step, or have a single consistent "state" type--almost has an existential flavor, really), but is at another function's mercy when it comes to actually doing anything.
All of which doesn't really shed too much light on the denotation of these things, I suppose, as there's barely anything there to talk about; the iteratee automaton itself is a terribly simple construct, relying on an underlying monad to perform actions, on an external "push" data source to recurse, and being given only bite-size chunks of data at each step. It's little more than foldl with a "pause" button attached.
Which was exactly my point about comparing conventional parsers and iteratees to build/foldr and unfoldr/destroy fusion. Except, I do think that this sheds some light on what the denotation of iteratees is. -- Live well, ~wren

Here's my (uneducated, half-baked) two cents: There's really no need for an "Iteratee" type at all, aside from the utility of defining Functor/Monad/etc instances for it. The core type is "step", which one can define (ignoring errors) as: data Step a b = Continue (a -> Step a b) | Yield b [a] Input chunking is simply an implementation detail, but it's important that the "yield" case be allowed to contain (>= 0) inputs. This allows steps to consume multiple values before deciding what to generate. In this representation, enumerators are functions from a Continue to a Step. type Enumerator a b = (a -> Step a b) -> Step a b I'll leave off discussion of enumeratees, since they're just a specialised type of enumerator. ------------- Things become a bit more complicated when error handling is added. Specifically, steps must have some response to EOF: data Step a b = Continue (a -> Step a b) (Result a b) | Result a b data Result a b = Yield b [a] | Error String In this representation, "Continue" has two branches. One for receiving more data, and another to be returned if there is no more input. This avoids the "divergent iteratee" problem, since it's not possible for Continue to be returned in response to EOF. Enumerators are similarly modified, except they are allowed to return "Continue" when their inner data source runs out. Therefore, both the "continue" and "eof" parameters are Step. type Enumerator a b = (a -> Step a b) -> Step a b -> Step a b ------------- Finally, support for monads is added. I don't know if denotational semantics typically considers monads, but I feel they're important when discussing enumerators/iteratees. After all, the entire point of the iteratee abstraction is to serve an alternative to lazy IO. data Step a m b = Continue (a -> m (Step a m b)) (m (Result a b)) | Result a b data Result a b = Yield b [a] | Error String type Enumerator m a b = (a -> m (Step a m b)) -> m (Step a m b) -> m (Step a m b) This is mostly the same as the second representation, except that it makes obvious at which point each value is calculated from the underlying monad.
From here, it's trivial to define the "Iteratee" type, if desired:
type Iteratee a m b = m (Step a m b) data Step a m b = Continue (a -> Iteratee a m b) (m (Result a b)) | Result a b data Result a b = Yield b [a] | Error String type Enumerator m a b = (a -> Iteratee a m b) -> Iteratee a m b -> Iteratee a m b ------------- Note: the data types I've arrived at here are significantly different from those defined by Oleg. Given my relative level of competency with logical programming in general and Haskell in particular, I suspect his definitions are superiour in some way I do not understand.
participants (9)
-
C. McCann
-
Conal Elliott
-
Conrad Parker
-
Heinrich Apfelmus
-
Jason Dagit
-
John Millikin
-
Luke Palmer
-
Nicolas Pouillard
-
wren ng thornton