Re: [Haskell-cafe] Fwd: Semantics of iteratees, enumerators, enumeratees?

From: John Millikin
Here's my (uneducated, half-baked) two cents:
There's really no need for an "Iteratee" type at all, aside from the utility of defining Functor/Monad/etc instances for it. The core type is "step", which one can define (ignoring errors) as:
data Step a b = Continue (a -> Step a b) | Yield b [a]
Input chunking is simply an implementation detail, but it's important that the "yield" case be allowed to contain (>= 0) inputs. This allows steps to consume multiple values before deciding what to generate.
In this representation, enumerators are functions from a Continue to a Step.
type Enumerator a b = (a -> Step a b) -> Step a b
I'll leave off discussion of enumeratees, since they're just a specialised type of enumerator.
-------------
Things become a bit more complicated when error handling is added. Specifically, steps must have some response to EOF:
data Step a b = Continue (a -> Step a b) (Result a b) | Result a b
data Result a b = Yield b [a] | Error String
In this representation, "Continue" has two branches. One for receiving more data, and another to be returned if there is no more input. This avoids the "divergent iteratee" problem, since it's not possible for Continue to be returned in response to EOF.
Is this really true? Consider iteratees that don't have a sensible default value (e.g. head) and an empty stream. You could argue that they should really return a Maybe, but then they wouldn't be divergent in other formulations either. Although I do find it interesting that EOF is no longer part of the stream at all. That may open up some possibilities. Also, I found this confusing because you're using Result as a data constructor for the Step type, but also as a separate type constructor. I expect this could lead to very confusing error messages ("What do you mean 'Result b a' doesn't have type 'Result'?")
Enumerators are similarly modified, except they are allowed to return "Continue" when their inner data source runs out. Therefore, both the "continue" and "eof" parameters are Step.
type Enumerator a b = (a -> Step a b) -> Step a b -> Step a b
I find this unclear as well, because you've unpacked the continue parameter but not the eof. I would prefer to see this as: type Enumerator a b = (a -> Step a b) -> Result a b -> Step a b However, is it useful to do so? That is, would there ever be a case where you would want to use branches from separate iteratees? If not, then why bother unpacking instead of just using type Enumerator a b = Step a b -> Step a John

On Wed, Aug 25, 2010 at 01:33, John Lato
Is this really true? Consider iteratees that don't have a sensible default value (e.g. head) and an empty stream. You could argue that they should really return a Maybe, but then they wouldn't be divergent in other formulations either. Although I do find it interesting that EOF is no longer part of the stream at all. That may open up some possibilities.
Divergent iteratees, using the current libraries, will simply throw an exception like "enumEOF: divergent iteratee". There's no way to get useful values out of them. Disallowing returning Continue when given an EOF prevents this invalid state.
Also, I found this confusing because you're using Result as a data constructor for the Step type, but also as a separate type constructor. I expect this could lead to very confusing error messages ("What do you mean 'Result b a' doesn't have type 'Result'?")
Oh, sorry, those constructors should be something like this (the system on which I wrote that email has no Haskell compiler, so I couldn't verify types before sending): data Step a b = Continue (a -> Step a b) (Result a b) | GotResult (Result a b) The goal is to let the iteratee signal three states: * Can accept more input, but terminating the stream now is acceptable * Requires more input, and terminating the stream now is an error * Cannot accept more input
I find this unclear as well, because you've unpacked the continue parameter but not the eof. I would prefer to see this as: type Enumerator a b = (a -> Step a b) -> Result a b -> Step a b
However, is it useful to do so? That is, would there ever be a case where you would want to use branches from separate iteratees? If not, then why bother unpacking instead of just using type Enumerator a b = Step a b -> Step a
When an enumerator terminates, it needs to pass control to the next enumerator (the final enumerator is enumEOF). Thus, the second "step" parameter is actually the next enumerator to run in the chain (aka the calling enumerator).
participants (2)
-
John Lato
-
John Millikin