Re: [Haskell-cafe] Re: Fwd: Semantics of iteratees, enumerators, enumeratees?

I think the big problem with chunking is that many useful iteratees need to be able to inspect the length of the chunk. The most obvious is "drop", but there are many others. Or if not inspect the length, have a new function on the stream "dropReport :: Int -> s -> (s, Int)" which reports how much was dropped. Either way, chunking adds a deal of implementation burden. I suspect that the proper vocabulary for iteratees wouldn't include chunks at all, only single elements. This discussion has prompted me to consider the implications of such an implementation, as it would be much simpler. I have one idea that I think will at least maintain performance for many operations, although there will be performance hits too. If the drawbacks are in areas that aren't particularly useful, though, it may be acceptable. John
From: Conal Elliott
Here's a way I've been tinkering with to think about iteratees clearly.
For simplicity, I'll stick with pure, error-free iteratees for now, and take chunks to be strings. Define a function that runs the iteratee:
runIter :: Iteratee a -> [String] -> (a, [String])
Note that chunking is explicit here.
Next, a relation that an iteratee implements a given specification, defined by a state transformer:
sat :: Iteratee a -> State String a -> Bool
Define sat in terms of concatenating chunks:
sat it st = second concat . runIter it == runState st . second concat
where the RHS equality is between functions (pointwise/extensionally), and runState uses the representation of State directly
runState :: State s a -> s -> (a,s)
(I think this sat definition is what Conrad was alluding to.)
Now use sat to specify and verify operations on iteratees and to *synthesize* those operations from their specifications. Some iteratees might not satisfy *any* (State-based) specification. For instance, an iteratee could look at the lengths or number of its chunks and produce results accordingly. I think of such iteratees as abstraction leaks. Can the iteratee vocabulary be honed to make only well-behaved (specifiable) iteratees possible to express? If so, can we preserve performance benefits?
If indeed the abstraction leaks can be fixed, I expect there will be a simpler & more conventional semantics than sat above.
- Conal
On Tue, Aug 24, 2010 at 2:55 PM, Conrad Parker
wrote: On 24 August 2010 14:47, Jason Dagit
wrote: On Mon, Aug 23, 2010 at 10:37 PM, Conrad Parker
wrote: On 24 August 2010 14:14, Jason Dagit
wrote: I'm not a semanticist, so I apologize right now if I say something stupid or incorrect.
On Mon, Aug 23, 2010 at 9:57 PM, Conal Elliott
wrote:
> > So perhaps this could be a reasonable semantics? > > Iteratee a = [Char] -> Maybe (a, [Char])
I've been tinkering with this model as well.
However, it doesn't really correspond to the iteratee interfaces I've seen, since those interfaces allow an iteratee to notice size and number of chunks. I suspect this ability is an accidental abstraction leak, which raises the question of how to patch the leak.
From a purely practical viewpoint I feel that treating the chunking as an abstraction leak might be missing the point. If you said, you wanted the semantics to acknowledge the chunking but be invariant under the size or number of the chunks then I would be happier.
I think that's the point, ie. to specify what the invariants should be. For example (to paraphrase, very poorly, something Conal wrote on the whiteboard behind me):
run [concat [chunk]] == run [chunk]
ie. the (a, [Char]) you maybe get from running an iteratee over any partitioning of chunks should be the same, ie. the same as from running it over the concatenation of all chunks, which is the whole input [Char].
I find this notation foreign. I get [Char], that's the Haskell String type, but what is [chunk]? I doubt you mean a list of one element.
sorry, that was just my way of writing "the list of chunks" or perhaps "the stream of chunks that represents the input".
Conrad.
I use iteratees when I need to be explicit about chunking and when I don't want the resources to "leak outside" of the stream processing. If
took those properties away, I wouldn't want to use it anymore because
you then it
would just be an inelegant way to do things.
Then I suppose the model for Enumerators is different than that for Iteratees; part of the point of an Enumerator is to control the size of the chunks, so that needs to be part of the model. An Iteratee, on the other hand, should not have to know the size of its chunks. So you don't want to be able to know the length of a chunk (ie. a part of the stream), but you do want to be able to, say, fold over it, and to be able to stop the computation at any time (these being the main point of iteratees ...).
I think I agree with that. Jason

Hi John,
Please note that I'm suggesting eliminating chunks from the semantics only
-- not from the implementation.
For precise & simple chunk-less semantics, it's only important that the
iteratees map equivalent input streams to equivalent output streams, where
"equivalent" means equal after concatenating all of the chunks. In other
words, the chunk lists denote their concatenations, so semantically equal
inputs must lead to semantically equal outputs. (Assuming I understand the
intention of chunking as being an implementation issue only, i.e., present
only for optimization.) We could call this property "semantic referential
transparency". IIUC, 'drop' is semantically RT, since it's *specified* in
terms of elements (and only *implemented* in terms of chunks).
Do you know of any iteratees in use that map (semantically) equal inputs to
(semantically) unequal outputs, i.e.,that violate semantic RT as I've
defined it? In the current APIs, one can easily define such iteratees, but
I'm hoping that the programming interfaces can be repaired to eliminate that
problem (the "abstraction leaks" I've been mentioning).
- Conal
On Tue, Aug 24, 2010 at 9:32 PM, John Lato
I think the big problem with chunking is that many useful iteratees need to be able to inspect the length of the chunk. The most obvious is "drop", but there are many others. Or if not inspect the length, have a new function on the stream "dropReport :: Int -> s -> (s, Int)" which reports how much was dropped. Either way, chunking adds a deal of implementation burden.
I suspect that the proper vocabulary for iteratees wouldn't include chunks at all, only single elements. This discussion has prompted me to consider the implications of such an implementation, as it would be much simpler. I have one idea that I think will at least maintain performance for many operations, although there will be performance hits too. If the drawbacks are in areas that aren't particularly useful, though, it may be acceptable.
John
From: Conal Elliott
Here's a way I've been tinkering with to think about iteratees clearly.
For simplicity, I'll stick with pure, error-free iteratees for now, and take chunks to be strings. Define a function that runs the iteratee:
runIter :: Iteratee a -> [String] -> (a, [String])
Note that chunking is explicit here.
Next, a relation that an iteratee implements a given specification, defined by a state transformer:
sat :: Iteratee a -> State String a -> Bool
Define sat in terms of concatenating chunks:
sat it st = second concat . runIter it == runState st . second concat
where the RHS equality is between functions (pointwise/extensionally), and runState uses the representation of State directly
runState :: State s a -> s -> (a,s)
(I think this sat definition is what Conrad was alluding to.)
Now use sat to specify and verify operations on iteratees and to *synthesize* those operations from their specifications. Some iteratees might not satisfy *any* (State-based) specification. For instance, an iteratee could look at the lengths or number of its chunks and produce results accordingly. I think of such iteratees as abstraction leaks. Can the iteratee vocabulary be honed to make only well-behaved (specifiable) iteratees possible to express? If so, can we preserve performance benefits?
If indeed the abstraction leaks can be fixed, I expect there will be a simpler & more conventional semantics than sat above.
- Conal
On Tue, Aug 24, 2010 at 2:55 PM, Conrad Parker
wrote: On 24 August 2010 14:47, Jason Dagit
wrote: On Mon, Aug 23, 2010 at 10:37 PM, Conrad Parker
wrote:
On 24 August 2010 14:14, Jason Dagit
wrote: I'm not a semanticist, so I apologize right now if I say something stupid or incorrect.
On Mon, Aug 23, 2010 at 9:57 PM, Conal Elliott
wrote:
>> >> So perhaps this could be a reasonable semantics? >> >> Iteratee a = [Char] -> Maybe (a, [Char]) > > I've been tinkering with this model as well. > > However, it doesn't really correspond to the iteratee interfaces I've > seen, since those interfaces allow an iteratee to notice size and > number of > chunks. I suspect this ability is an accidental abstraction leak, > which > raises the question of how to patch the leak.
From a purely practical viewpoint I feel that treating the chunking as an abstraction leak might be missing the point. If you said, you wanted the semantics to acknowledge the chunking but be invariant under the size or number of the chunks then I would be happier.
I think that's the point, ie. to specify what the invariants should be. For example (to paraphrase, very poorly, something Conal wrote on the whiteboard behind me):
run [concat [chunk]] == run [chunk]
ie. the (a, [Char]) you maybe get from running an iteratee over any partitioning of chunks should be the same, ie. the same as from running it over the concatenation of all chunks, which is the whole input [Char].
I find this notation foreign. I get [Char], that's the Haskell String type, but what is [chunk]? I doubt you mean a list of one element.
sorry, that was just my way of writing "the list of chunks" or perhaps "the stream of chunks that represents the input".
Conrad.
I use iteratees when I need to be explicit about chunking and when
don't want the resources to "leak outside" of the stream processing. If you took those properties away, I wouldn't want to use it anymore because
would just be an inelegant way to do things.
Then I suppose the model for Enumerators is different than that for Iteratees; part of the point of an Enumerator is to control the size of the chunks, so that needs to be part of the model. An Iteratee, on the other hand, should not have to know the size of its chunks. So you don't want to be able to know the length of a chunk (ie. a part of
I then it the
stream), but you do want to be able to, say, fold over it, and to be able to stop the computation at any time (these being the main point of iteratees ...).
I think I agree with that. Jason

Hi Conal,
I'm aware of one case that violates semantic referential transparency, but
it's a bug. Which pretty much proves your point as I understand it.
John
On Tue, Aug 24, 2010 at 2:01 PM, Conal Elliott
Hi John,
Please note that I'm suggesting eliminating chunks from the semantics only -- not from the implementation.
For precise & simple chunk-less semantics, it's only important that the iteratees map equivalent input streams to equivalent output streams, where "equivalent" means equal after concatenating all of the chunks. In other words, the chunk lists denote their concatenations, so semantically equal inputs must lead to semantically equal outputs. (Assuming I understand the intention of chunking as being an implementation issue only, i.e., present only for optimization.) We could call this property "semantic referential transparency". IIUC, 'drop' is semantically RT, since it's *specified* in terms of elements (and only *implemented* in terms of chunks).
Do you know of any iteratees in use that map (semantically) equal inputs to (semantically) unequal outputs, i.e.,that violate semantic RT as I've defined it? In the current APIs, one can easily define such iteratees, but I'm hoping that the programming interfaces can be repaired to eliminate that problem (the "abstraction leaks" I've been mentioning).
- Conal
On Tue, Aug 24, 2010 at 9:32 PM, John Lato
wrote: I think the big problem with chunking is that many useful iteratees need to be able to inspect the length of the chunk. The most obvious is "drop", but there are many others. Or if not inspect the length, have a new function on the stream "dropReport :: Int -> s -> (s, Int)" which reports how much was dropped. Either way, chunking adds a deal of implementation burden.
I suspect that the proper vocabulary for iteratees wouldn't include chunks at all, only single elements. This discussion has prompted me to consider the implications of such an implementation, as it would be much simpler. I have one idea that I think will at least maintain performance for many operations, although there will be performance hits too. If the drawbacks are in areas that aren't particularly useful, though, it may be acceptable.
John
From: Conal Elliott
Here's a way I've been tinkering with to think about iteratees clearly.
For simplicity, I'll stick with pure, error-free iteratees for now, and take chunks to be strings. Define a function that runs the iteratee:
runIter :: Iteratee a -> [String] -> (a, [String])
Note that chunking is explicit here.
Next, a relation that an iteratee implements a given specification, defined by a state transformer:
sat :: Iteratee a -> State String a -> Bool
Define sat in terms of concatenating chunks:
sat it st = second concat . runIter it == runState st . second concat
where the RHS equality is between functions (pointwise/extensionally), and runState uses the representation of State directly
runState :: State s a -> s -> (a,s)
(I think this sat definition is what Conrad was alluding to.)
Now use sat to specify and verify operations on iteratees and to *synthesize* those operations from their specifications. Some iteratees might not satisfy *any* (State-based) specification. For instance, an iteratee could look at the lengths or number of its chunks and produce results accordingly. I think of such iteratees as abstraction leaks. Can the iteratee vocabulary be honed to make only well-behaved (specifiable) iteratees possible to express? If so, can we preserve performance benefits?
If indeed the abstraction leaks can be fixed, I expect there will be a simpler & more conventional semantics than sat above.
- Conal
On Tue, Aug 24, 2010 at 2:55 PM, Conrad Parker
wrote: On 24 August 2010 14:47, Jason Dagit
wrote: On Mon, Aug 23, 2010 at 10:37 PM, Conrad Parker <
conrad@metadecks.org>
wrote:
On 24 August 2010 14:14, Jason Dagit
wrote: > I'm not a semanticist, so I apologize right now if I say something > stupid or > incorrect. > > On Mon, Aug 23, 2010 at 9:57 PM, Conal Elliott wrote:
>>> >>> So perhaps this could be a reasonable semantics? >>> >>> Iteratee a = [Char] -> Maybe (a, [Char]) >> >> I've been tinkering with this model as well. >> >> However, it doesn't really correspond to the iteratee interfaces I've >> seen, since those interfaces allow an iteratee to notice size and >> number of >> chunks. I suspect this ability is an accidental abstraction leak, >> which >> raises the question of how to patch the leak. > > From a purely practical viewpoint I feel that treating the chunking as > an > abstraction leak might be missing the point. If you said, you wanted > the > semantics to acknowledge the chunking but be invariant under the size or > number of the chunks then I would be happier.
I think that's the point, ie. to specify what the invariants should be. For example (to paraphrase, very poorly, something Conal wrote on the whiteboard behind me):
run [concat [chunk]] == run [chunk]
ie. the (a, [Char]) you maybe get from running an iteratee over any partitioning of chunks should be the same, ie. the same as from running it over the concatenation of all chunks, which is the whole input [Char].
I find this notation foreign. I get [Char], that's the Haskell String type, but what is [chunk]? I doubt you mean a list of one element.
sorry, that was just my way of writing "the list of chunks" or perhaps "the stream of chunks that represents the input".
Conrad.
> I use iteratees when I need to be explicit about chunking and when
> don't > want the resources to "leak outside" of the stream processing. If you > took > those properties away, I wouldn't want to use it anymore because
> would just be an inelegant way to do things.
Then I suppose the model for Enumerators is different than that for Iteratees; part of the point of an Enumerator is to control the size of the chunks, so that needs to be part of the model. An Iteratee, on the other hand, should not have to know the size of its chunks. So you don't want to be able to know the length of a chunk (ie. a part of
I then it the
stream), but you do want to be able to, say, fold over it, and to be able to stop the computation at any time (these being the main point of iteratees ...).
I think I agree with that. Jason
participants (2)
-
Conal Elliott
-
John Lato