[Haskell-cafe] Re: Iteratee question

26 Nov 2009

      Valery V. Vorotyntsev wrote:
...
The following pattern appears quite often in my code:
...
results <- map someConversion `liftM` replicateM nbytes Iter.head
The meaning is: take `nbytes' from stream, apply `someConversion' to
every byte and return the list of `results'.
But there's more than one way to do it:
...
i1, i2, i3 :: Monad m => Int -> IterateeG [] Word8 m [String]
i1 n = map conv `liftM` replicateM n Iter.head
i2 n = map conv `liftM` joinI (Iter.take n stream2list)
i3 n = joinI $ Iter.take n $ joinI $ mapStream conv stream2list
...
Of those i1, i2, i3 which one is "better" and why? Or is there another -
preferable - way of applying iteratees to this task?
...
My nai:ve guess is that i1 will have worse performance with big n's. It
looks like `i1' is reading bytes one by one, while `i2' takes whole
chunks of data... I'm not sure though.
You are correct: i2 and i3 can process a chunk of elements at a time,
if an enumerator supplies it. That means an iteratee like i2 or i3 can
do more work per invocation -- which is always good. Since you have to
get the results as a list, you pretty much have to use stream2list. It
should be noted that stream2list isn't very efficient: it returns the
accumulated list only when it is done -- which happens when the stream
is terminated, normally or abnormally. So, stream2list has a terrible
latency, and is useful only at the last stage of processing. I found
it is most useful for testing (to see the resulting stream) and for
writing Unit tests (to compare the produced results with the
expected). For incremental processing, it is better to stay within
Iteratees.

Although I think i2 and i3 should be close in performance (only
benchmarking can tell for sure, of course), i3 is more extensible
because stream2list is at the end of the chain. If later on further
processing is required (or, the latency imposed by stream2list becomes
noticeable), the chain can be easily extended. The advantage of the
arrangement of i3 is that if some Iteratee further down the chain
decided that it has had enough (elements), Iter.take can quickly skip
the remaining elements without the need to convert them.

[Haskell-cafe] Re: Iteratee question

oleg＠okmij.org