
From: Felipe Almeida Lessa
On Thu, Apr 28, 2011 at 1:10 PM, Felipe Almeida Lessa
wrote: On Thu, Apr 28, 2011 at 12:09 PM, Felipe Almeida Lessa
wrote: I foresee one problem: what is the leftover of 'manyToOne xs' if each x in xs needs different lengths of input?
One possible untested-but-compiling solution: [snip]
Like I said, that manyToOne implementation isn't very predictable about leftovers. ?But I guess that if all your iteratees consume the same input OR if you don't care about leftovers, then it should be okay.
Sorry for replying to myself again. =)
I think you can actually give predictable semantics to manyToOne: namely, the leftovers from the last iteratee are returned. This new implementation should be better:
If you do this, the user needs to take care to order the iteratees so that the last iteratee has small leftovers. Consider: manyToOne [consumeALot, return ()] In this case, the entire stream consumed by the first iteratee will need to be retained and passed on by manyToOne. In many cases, the user may not know how much each iteratee will consume, which can make these semantics problematic. Iteratee has 'enumPair', (renamed 'zip' in HEAD) which returns the leftovers from whichever iteratee consumes more. This avoids the problem of retaining extra data, and seems simpler to reason about. Although if you really need to consume a predictable amount of data, the safest is probably to run the whole thing in a 'take'. John Lato

On Fri, Apr 29, 2011 at 6:32 AM, John Lato
If you do this, the user needs to take care to order the iteratees so that the last iteratee has small leftovers. Consider:
manyToOne [consumeALot, return ()]
In this case, the entire stream consumed by the first iteratee will need to be retained and passed on by manyToOne. In many cases, the user may not know how much each iteratee will consume, which can make these semantics problematic.
Iteratee has 'enumPair', (renamed 'zip' in HEAD) which returns the leftovers from whichever iteratee consumes more. This avoids the problem of retaining extra data, and seems simpler to reason about. Although if you really need to consume a predictable amount of data, the safest is probably to run the whole thing in a 'take'.
My motivation is: in general it is difficult (impossible?) to choose the iteratee that consumed more data because you don't know what the data is. For example, if you give 'Chunks [a,b]' to two iteratees and one of them returns 'Chunks [c]' and the other one returns 'Chunks [d]', which one consumed more data? The answer is that it depends on the types. If they are Ints, both consumed the same, if they are ByteStrings, you would need to check if one is prefix of the other. What if one returns 'Chunks [c]' and the other one returns 'Chunks [d,e]'? If they are ByteStrings, should we compare 'c' against 'd ++ e'? So I thought it would be easier to program with an API that is predictable and immune to changes in block sizes. If you don't want leftovers, just use 'manyToOne [..., dropWhile (const True)]', which guarantees that you won't leak. Cheers, -- Felipe.

On Fri, Apr 29, 2011 at 12:20 PM, Felipe Almeida Lessa < felipe.lessa@gmail.com> wrote:
If you do this, the user needs to take care to order the iteratees so
the last iteratee has small leftovers. Consider:
manyToOne [consumeALot, return ()]
In this case, the entire stream consumed by the first iteratee will need to be retained and passed on by manyToOne. In many cases, the user may not know how much each iteratee will consume, which can make these semantics problematic.
Iteratee has 'enumPair', (renamed 'zip' in HEAD) which returns the leftovers from whichever iteratee consumes more. This avoids the problem of retaining extra data, and seems simpler to reason about. Although if you really need to consume a predictable amount of data, the safest is probably to run
On Fri, Apr 29, 2011 at 6:32 AM, John Lato
wrote: that the whole thing in a 'take'.
My motivation is: in general it is difficult (impossible?) to choose the iteratee that consumed more data because you don't know what the data is. For example, if you give 'Chunks [a,b]' to two iteratees and one of them returns 'Chunks [c]' and the other one returns 'Chunks [d]', which one consumed more data? The answer is that it depends on the types. If they are Ints, both consumed the same, if they are ByteStrings, you would need to check if one is prefix of the other. What if one returns 'Chunks [c]' and the other one returns 'Chunks [d,e]'? If they are ByteStrings, should we compare 'c' against 'd ++ e'?
This situation results from the implementation in the enumerator package. In iteratee it doesn't arise with well-behaved* iteratees, because only one chunk is ever processed at a time. It's only necessary to check the length of the returned chunks to see which consumed more data. By well-behaved, I mean that the chunk returned by an iteratee must be a tail of the provided input. In other words, it returns only unconsumed data from the stream and doesn't alter the stream. At least in the iteratee package, an iteratee which violates this rule is likely to result in undefined behavior (in general, not just this function).
So I thought it would be easier to program with an API that is predictable and immune to changes in block sizes. If you don't want leftovers, just use 'manyToOne [..., dropWhile (const True)]', which guarantees that you won't leak.
Iteratees should be immune to changes in block sizes anyway, although it's been a while since I looked at the enumerator implementation so it could be different. If you use 'manyToOne [..., dropWhile (const True)]', when does it terminate? John

In my case leftover is not important.
But in common case... Just an idea...
What if we provide
iterWhile :: Iteratee a m () -> Iteratee a m b -> Iteratee a m b
The first Iteratee only control when the result should be yeilded and feed
an input to second Iteratee.
Then we can change manyToOne to
manyToOne' :: Iteratee a m () -> [Iteratee a m b] -> Iteratee a m [b]
manyToOne' iw = iterWhile iw . manyToOne
2011/4/29 Felipe Almeida Lessa
If you do this, the user needs to take care to order the iteratees so
the last iteratee has small leftovers. Consider:
manyToOne [consumeALot, return ()]
In this case, the entire stream consumed by the first iteratee will need to be retained and passed on by manyToOne. In many cases, the user may not know how much each iteratee will consume, which can make these semantics problematic.
Iteratee has 'enumPair', (renamed 'zip' in HEAD) which returns the leftovers from whichever iteratee consumes more. This avoids the problem of retaining extra data, and seems simpler to reason about. Although if you really need to consume a predictable amount of data, the safest is probably to run
On Fri, Apr 29, 2011 at 6:32 AM, John Lato
wrote: that the whole thing in a 'take'.
My motivation is: in general it is difficult (impossible?) to choose the iteratee that consumed more data because you don't know what the data is. For example, if you give 'Chunks [a,b]' to two iteratees and one of them returns 'Chunks [c]' and the other one returns 'Chunks [d]', which one consumed more data? The answer is that it depends on the types. If they are Ints, both consumed the same, if they are ByteStrings, you would need to check if one is prefix of the other. What if one returns 'Chunks [c]' and the other one returns 'Chunks [d,e]'? If they are ByteStrings, should we compare 'c' against 'd ++ e'?
So I thought it would be easier to program with an API that is predictable and immune to changes in block sizes. If you don't want leftovers, just use 'manyToOne [..., dropWhile (const True)]', which guarantees that you won't leak.
Cheers,
-- Felipe.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (3)
-
Dmitry Olshansky
-
Felipe Almeida Lessa
-
John Lato