Re: [Haskell-cafe] Iteratee performance

19 Mar 2010

      I think this is a bit easier to write with iteratee-HEAD.  There are
some significant changes from the 0.3 version, and not all the old
functions are implemented yet, however the "cnt" iteratee can be
written as:

cnt :: Monad m => I.Iteratee S.ByteString m Int
cnt = I.liftI (step 0)
  where
    step acc (I.Chunk bs) | S.null bs = I.icont (step acc) Nothing
    step acc (I.Chunk bs) = let acc' = acc + S.count '\n' bs in acc'
`seq` I.icont (step acc') Nothing
    step acc str = I.idone acc str

One significant change is that the kind of the first parameter to the
Iteratee type has changed, so now it should be a fully-applied type
instead of a type function.  This means that ByteString can be used
directly.  "idone", "icont", and "liftI" simplify creation of
iteratees, and it's usually not necessary to pattern match on the EOF
constructor any more.  The first "step" definition above can be left
out for a small performance penalty.

I wouldn't recommend basing any production code on iteratee-HEAD, as
the interface isn't quite finalized yet.

With this version of "cnt" and iteratee-HEAD, the iteratee version
runs in about 2.2 seconds for the same tests as I used below.
Changing the enumFd buffer size to 32K gives the following results:

a 460MB input file (generated by cp Tiff.hs long.txt; cat long.txt >> long.txt):

MusDept-MacBook-1:Examples johnlato$ time wc -l long.txt
 12024249 long.txt

real	0m0.997s

MusDept-MacBook-1:Examples johnlato$ time ./test_bs < long.txt
12024249

real	0m1.161s

MusDept-MacBook-1:Examples johnlato$ time ./test_iter long.txt
12024249

real	0m1.154s

All time values are averages of 3 runs.  The first run for the
bytestring version was a bit long, otherwise run times were very
consistent, within 0.004s for each executable.

I don't see any reason the buffer size needs to be fixed at compile
time.  I'll make this change in the next major release.

John
...
From: Vasyl Pasternak 
Subject: Re: [Haskell-cafe] Iteratee performance
To: Gregory Collins 
Gregory,
Thank you, your code helps, now my it runs in the speed of lazy
bytestring test but uses less memory with it.
I've only added to your code more strictness in the recursion, my
version is below.
BTW, I think it is more useful to let user set the chunk size for
reading, so I'd like to see this possibility in the iteratee package.

Re: [Haskell-cafe] Iteratee performance

John Lato