
I think this is a bit easier to write with iteratee-HEAD. There are some significant changes from the 0.3 version, and not all the old functions are implemented yet, however the "cnt" iteratee can be written as: cnt :: Monad m => I.Iteratee S.ByteString m Int cnt = I.liftI (step 0) where step acc (I.Chunk bs) | S.null bs = I.icont (step acc) Nothing step acc (I.Chunk bs) = let acc' = acc + S.count '\n' bs in acc' `seq` I.icont (step acc') Nothing step acc str = I.idone acc str One significant change is that the kind of the first parameter to the Iteratee type has changed, so now it should be a fully-applied type instead of a type function. This means that ByteString can be used directly. "idone", "icont", and "liftI" simplify creation of iteratees, and it's usually not necessary to pattern match on the EOF constructor any more. The first "step" definition above can be left out for a small performance penalty. I wouldn't recommend basing any production code on iteratee-HEAD, as the interface isn't quite finalized yet. With this version of "cnt" and iteratee-HEAD, the iteratee version runs in about 2.2 seconds for the same tests as I used below. Changing the enumFd buffer size to 32K gives the following results: a 460MB input file (generated by cp Tiff.hs long.txt; cat long.txt >> long.txt): MusDept-MacBook-1:Examples johnlato$ time wc -l long.txt 12024249 long.txt real 0m0.997s MusDept-MacBook-1:Examples johnlato$ time ./test_bs < long.txt 12024249 real 0m1.161s MusDept-MacBook-1:Examples johnlato$ time ./test_iter long.txt 12024249 real 0m1.154s All time values are averages of 3 runs. The first run for the bytestring version was a bit long, otherwise run times were very consistent, within 0.004s for each executable. I don't see any reason the buffer size needs to be fixed at compile time. I'll make this change in the next major release. John
From: Vasyl Pasternak
Subject: Re: [Haskell-cafe] Iteratee performance To: Gregory Collins Gregory,
Thank you, your code helps, now my it runs in the speed of lazy bytestring test but uses less memory with it.
I've only added to your code more strictness in the recursion, my version is below.
BTW, I think it is more useful to let user set the chunk size for reading, so I'd like to see this possibility in the iteratee package.