Hi John,
Please note that I'm suggesting eliminating chunks from the semantics only -- not from the implementation.
For precise & simple chunk-less semantics, it's only important that the iteratees map equivalent input streams to equivalent output streams, where "equivalent" means equal after concatenating all of the chunks. In other words, the chunk lists denote their concatenations, so semantically equal inputs must lead to semantically equal outputs. (Assuming I understand the intention of chunking as being an implementation issue only, i.e., present only for optimization.) We could call this property "semantic referential transparency". IIUC, 'drop' is semantically RT, since it's *specified* in terms of elements (and only *implemented* in terms of chunks).
Do you know of any iteratees in use that map (semantically) equal inputs to (semantically) unequal outputs, i.e.,that violate semantic RT as I've defined it? In the current APIs, one can easily define such iteratees, but I'm hoping that the programming interfaces can be repaired to eliminate that problem (the "abstraction leaks" I've been mentioning).
- Conal
I think the big problem with chunking is that many useful iteratees need to be able to inspect the length of the chunk. The most obvious is "drop", but there are many others. Or if not inspect the length, have a new function on the stream "dropReport :: Int -> s -> (s, Int)" which reports how much was dropped. Either way, chunking adds a deal of implementation burden.I suspect that the proper vocabulary for iteratees wouldn't include chunks at all, only single elements. This discussion has prompted me to consider the implications of such an implementation, as it would be much simpler. I have one idea that I think will at least maintain performance for many operations, although there will be performance hits too. If the drawbacks are in areas that aren't particularly useful, though, it may be acceptable.JohnFrom: Conal Elliott <conal@conal.net>
Here's a way I've been tinkering with to think about iteratees clearly.
For simplicity, I'll stick with pure, error-free iteratees for now, and take
chunks to be strings. Define a function that runs the iteratee:
> runIter :: Iteratee a -> [String] -> (a, [String])
Note that chunking is explicit here.
Next, a relation that an iteratee implements a given specification, defined
by a state transformer:
> sat :: Iteratee a -> State String a -> Bool
Define sat in terms of concatenating chunks:
> sat it st =
> second concat . runIter it == runState st . second concat
where the RHS equality is between functions (pointwise/extensionally), and
runState uses the representation of State directly
> runState :: State s a -> s -> (a,s)
(I think this sat definition is what Conrad was alluding to.)
Now use sat to specify and verify operations on iteratees and to
*synthesize* those operations from their specifications. Some iteratees
might not satisfy *any* (State-based) specification. For instance, an
iteratee could look at the lengths or number of its chunks and produce
results accordingly. I think of such iteratees as abstraction leaks. Can
the iteratee vocabulary be honed to make only well-behaved (specifiable)
iteratees possible to express? If so, can we preserve performance benefits?
If indeed the abstraction leaks can be fixed, I expect there will be a
simpler & more conventional semantics than sat above.
- Conal
On Tue, Aug 24, 2010 at 2:55 PM, Conrad Parker <conrad@metadecks.org> wrote:
> On 24 August 2010 14:47, Jason Dagit <dagit@codersbase.com> wrote:
> >
> >
> > On Mon, Aug 23, 2010 at 10:37 PM, Conrad Parker <conrad@metadecks.org>
> > wrote:
> >>
> >> On 24 August 2010 14:14, Jason Dagit <dagit@codersbase.com> wrote:
> >> > I'm not a semanticist, so I apologize right now if I say something
> >> > stupid or
> >> > incorrect.
> >> >
> >> > On Mon, Aug 23, 2010 at 9:57 PM, Conal Elliott <conal@conal.net>
> wrote:
> >> >>>
> >> >>> So perhaps this could be a reasonable semantics?
> >> >>>
> >> >>> Iteratee a = [Char] -> Maybe (a, [Char])
> >> >>
> >> >> I've been tinkering with this model as well.
> >> >>
> >> >> However, it doesn't really correspond to the iteratee interfaces I've
> >> >> seen, since those interfaces allow an iteratee to notice size and
> >> >> number of
> >> >> chunks. I suspect this ability is an accidental abstraction leak,
> >> >> which
> >> >> raises the question of how to patch the leak.
> >> >> >> > From a purely practical viewpoint I feel that treating the chunking as
> >> > an
> >> > abstraction leak might be missing the point. If you said, you wanted
> >> > the
> >> > semantics to acknowledge the chunking but be invariant under the size
> or
> >> > number of the chunks then I would be happier.
> >>> >> I think that's the point, ie. to specify what the invariants should
> >> be. For example (to paraphrase, very poorly, something Conal wrote on
> >> the whiteboard behind me):
> >>
> >> run [concat [chunk]] == run [chunk]
> >>
> >> ie. the (a, [Char]) you maybe get from running an iteratee over any
> >> partitioning of chunks should be the same, ie. the same as from
> >> running it over the concatenation of all chunks, which is the whole
> >> input [Char].
> >
> > I find this notation foreign. I get [Char], that's the Haskell String
> > type, but what is [chunk]? I doubt you mean a list of one element.
>
> sorry, that was just my way of writing "the list of chunks" or perhaps
> "the stream of chunks that represents the input".
>
> Conrad.
>
> >
> >>> >> > I use iteratees when I need to be explicit about chunking and when I
> >> > don't
> >> > want the resources to "leak outside" of the stream processing. If you
> >> > took
> >> > those properties away, I wouldn't want to use it anymore because then
> it
> >> > would just be an inelegant way to do things.
> >>> >> Then I suppose the model for Enumerators is different than that for
> >> Iteratees; part of the point of an Enumerator is to control the size
> >> of the chunks, so that needs to be part of the model. An Iteratee, on
> >> the other hand, should not have to know the size of its chunks. So you
> >> don't want to be able to know the length of a chunk (ie. a part of the
> >> stream), but you do want to be able to, say, fold over it, and to be
> >> able to stop the computation at any time (these being the main point
> >> of iteratees ...).
> >
> > I think I agree with that.
> > Jason
>