
Hi Cafe, Yesterday I played with iteratee package, and wanted to check its performance. I tried to count lines in a file, as Oleg in his famous lazy_vs_correct[1] article. The results somewhat disappointed me. The statistics and code follows, but shortly: lazy bytestring is the fastest, iteratee with bytestrings 10 times slower than lazy bytestring. When comparing lazy string and iteratee with [Char], than their results were close, but lazy string reading uses less memory and was a bit faster (20%). I performed test on 250Mb file with 5 millions lines. Now I am figuring out, is these tests are correct and this is ordinary behavior, so iteratee not so fast as I thought, or there is some mistake in my code. [1] http://okmij.org/ftp/Haskell/Iteratee/Lazy-vs-correct.txt --------------------- TIMING RESULTS ---------------------- $ time wc -l 5000000.txt 5000000 5000000.txt real 0m0.314s user 0m0.184s sys 0m0.124s $ time ./bytestring_test 5000000.txt 5000000 real 0m0.817s user 0m0.616s sys 0m0.200s $ time ./bytestring_iteratee_test 5000000.txt real 0m7.801s user 0m7.552s sys 0m0.252s $ time ./string_test 5000000.txt 5000000 real 0m47.427s user 0m46.675s sys 0m0.648s $ time ./string_iteratee_test 5000000.txt 5000000 real 0m59.225s user 0m57.680s sys 0m0.840s -------------------------- RTS INFO -------------------------------- ./bytestring_test 5000000.txt +RTS -sbs.out 807,225,096 bytes allocated in the heap 122,240 bytes copied during GC 59,496 bytes maximum residency (1 sample(s)) 22,424 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 1540 collections, 0 parallel, 0.03s, 0.02s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 0.59s ( 0.79s elapsed) GC time 0.03s ( 0.02s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.62s ( 0.82s elapsed) %GC time 4.5% (2.8% elapsed) Alloc rate 1,372,743,081 bytes per MUT second Productivity 95.5% of total user, 72.1% of total elapsed ----- ./bytestring_iteratee_test 5000000.txt +RTS -siter.out 11,024,100,312 bytes allocated in the heap 893,436,512 bytes copied during GC 95,456 bytes maximum residency (1 sample(s)) 23,216 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 21030 collections, 0 parallel, 2.51s, 2.45s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.02s elapsed) MUT time 6.37s ( 6.66s elapsed) GC time 2.52s ( 2.45s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 8.88s ( 9.12s elapsed) %GC time 28.3% (26.9% elapsed) Alloc rate 1,731,061,437 bytes per MUT second Productivity 71.7% of total user, 69.8% of total elapsed ----- ./string_test 5000000.txt +RTS -sstr.out 38,561,155,264 bytes allocated in the heap 9,862,623,816 bytes copied during GC 223,080 bytes maximum residency (5026 sample(s)) 47,264 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 68525 collections, 0 parallel, 22.50s, 22.51s elapsed Generation 1: 5026 collections, 0 parallel, 1.38s, 1.36s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 22.80s ( 23.55s elapsed) GC time 23.87s ( 23.87s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 46.67s ( 47.42s elapsed) %GC time 51.1% (50.3% elapsed) Alloc rate 1,691,170,222 bytes per MUT second Productivity 48.9% of total user, 48.1% of total elapsed ----- ./string_iteratee_test 5000000.txt +RTS -sstriter.out 40,164,683,672 bytes allocated in the heap 7,108,638,256 bytes copied during GC 212,624 bytes maximum residency (821 sample(s)) 50,264 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 75791 collections, 0 parallel, 33.14s, 33.75s elapsed Generation 1: 821 collections, 0 parallel, 0.56s, 0.63s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 23.99s ( 24.84s elapsed) GC time 33.69s ( 34.38s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 57.68s ( 59.22s elapsed) %GC time 58.4% (58.1% elapsed) Alloc rate 1,674,540,397 bytes per MUT second Productivity 41.6% of total user, 40.5% of total elapsed ------------------ SOURCECODE ----------------------------- $ cat bytestring_test.hs import System.Environment import Control.Monad import qualified Data.ByteString.Lazy.Char8 as B count s = liftM (length . B.lines) (B.readFile s) main = do [f] <- getArgs print =<< count f --------------- $ cat bytestring_iteratee_test.hs import qualified Data.Iteratee.IO as I import qualified Data.Iteratee as I import qualified Data.Iteratee.Char as I import qualified Data.Iteratee.WrappedByteString as I import System.Environment import Control.Monad count s = I.fileDriverRandom (cnt) s cnt :: (Monad m, Functor m) => I.IterateeG I.WrappedByteString Char m Int cnt = I.joinI $ I.enumLines I.length main = do [f] <- getArgs print =<< count f ------------------ $ cat string_test.hs import System.Environment import Control.Monad count s = liftM (length . lines) (readFile s) main = do [f] <- getArgs print =<< count f ---------------------- $ cat string_iteratee_test.hs import qualified Data.Iteratee.IO as I import qualified Data.Iteratee as I import qualified Data.Iteratee.Char as I import qualified Data.Iteratee.WrappedByteString as I import System.Environment import Control.Monad count s = I.fileDriverRandom (cnt) s cnt :: (Monad m, Functor m) => I.IterateeG [] Char m Int cnt = I.joinI $ I.enumLines I.length main = do [f] <- getArgs print =<< count f Best regards, Vasyl

From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Vasyl Pasternak
Yesterday I played with iteratee package, and wanted to check its performance. I tried to count lines in a file, as Oleg in his famous lazy_vs_correct[1] article. The results somewhat disappointed me.
The statistics and code follows, but shortly: lazy bytestring is the fastest, iteratee with bytestrings 10 times slower than lazy bytestring. When comparing lazy string and iteratee with [Char], than their results were close, but lazy string reading uses less memory and was a bit faster (20%).
I performed test on 250Mb file with 5 millions lines.
Not tuning advice, but still... If you have the time, could you also try Peter Simon's old block-io proposal? http://cryp.to/blockio/fast-io.html I'm curious to see how it compares to bytestring. I can't see why the iteratee approach can't have similar performance. Alistair ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

Vasyl Pasternak
Hi Cafe,
Yesterday I played with iteratee package, and wanted to check its performance. I tried to count lines in a file, as Oleg in his famous lazy_vs_correct[1] article. The results somewhat disappointed me.
eris:benchmark greg$ time ./IterateeTest /usr/share/dict/words
234936
real 0m0.027s
user 0m0.010s
sys 0m0.015s
eris:benchmark greg$ time ./ByteStringTest /usr/share/dict/words
234936
real 0m0.024s
user 0m0.015s
sys 0m0.007s
Note also that the Bytestring I/O functions use a 32KB buffer and the
iteratee "enumFd" function uses a 4KB buffer; if the buffers were the
same the performance would be comparable. Here is my code:
------------------------------------------------------------------------
{-# LANGUAGE RankNTypes, ScopedTypeVariables #-}
import qualified Data.Iteratee.IO.Fd as I
import qualified Data.Iteratee as I
import qualified Data.Iteratee.WrappedByteString as I
import qualified Data.ByteString.Char8 as S
import System.Environment
import System.IO
count :: FilePath -> IO Int
count s = I.fileDriverFd cnt s
cnt :: (Monad m) => I.IterateeG I.WrappedByteString Char m Int
cnt = go 0
where
go n = I.IterateeG $ \ch ->
case ch of
(I.EOF Nothing) -> return $ I.Done n ch
(I.EOF (Just e)) -> return $ I.Cont cnt (Just e)
(I.Chunk (I.WrapBS s)) -> do
let n' = n + S.count '\n' s
return $ I.Cont (go n') Nothing
main :: IO ()
main = do
[f] <- getArgs
print =<< count f
--
Gregory Collins

Am Mittwoch 17 März 2010 17:01:06 schrieb Gregory Collins:
Vasyl Pasternak
writes: Hi Cafe,
Yesterday I played with iteratee package, and wanted to check its performance. I tried to count lines in a file, as Oleg in his famous lazy_vs_correct[1] article. The results somewhat disappointed me.
eris:benchmark greg$ time ./IterateeTest /usr/share/dict/words 234936
real 0m0.027s user 0m0.010s sys 0m0.015s eris:benchmark greg$ time ./ByteStringTest /usr/share/dict/words 234936
real 0m0.024s user 0m0.015s sys 0m0.007s
Note also that the Bytestring I/O functions use a 32KB buffer and the iteratee "enumFd" function uses a 4KB buffer; if the buffers were the same the performance would be comparable. Here is my code:
Which gave not so good performance on a large file, it ate ~530MB memory for a 260MB file with 20915399 lines (after the first run, when the other apps' memory had been swapped out, it wasn't awful anymore, just mediocre). It was, however, a good memory citizen - and pretty fast - after putting a BangPattern on go's argument. Times: wc -l: 0.90s your code with the BangPattern: 1.10s Vasyl's Iteratee code: 7.61s Vasyl's ByteString code: 1.40s Code from http://www.haskell.org/haskellwiki/Wc [1]: 0.76s (all Haskell code compiled with -O2 by ghc-6.12.1). I used iteratee-0.3.4 from hackage. So I'd say, if you know how, Iteratees can already be fast, but it seems to be easier to write slow code than with ByteString. [1] The winner: import qualified Data.ByteString.Lazy.Char8 as L main :: IO () main = L.getContents >>= print . L.count '\n'
------------------------------------------------------------------------
{-# LANGUAGE RankNTypes, ScopedTypeVariables #-}
import qualified Data.Iteratee.IO.Fd as I import qualified Data.Iteratee as I import qualified Data.Iteratee.WrappedByteString as I import qualified Data.ByteString.Char8 as S
import System.Environment import System.IO
count :: FilePath -> IO Int count s = I.fileDriverFd cnt s
cnt :: (Monad m) => I.IterateeG I.WrappedByteString Char m Int cnt = go 0 where go n = I.IterateeG $ \ch -> case ch of (I.EOF Nothing) -> return $ I.Done n ch (I.EOF (Just e)) -> return $ I.Cont cnt (Just e) (I.Chunk (I.WrapBS s)) -> do let n' = n + S.count '\n' s return $ I.Cont (go n') Nothing
main :: IO () main = do [f] <- getArgs print =<< count f

Gregory,
Thank you, your code helps, now my it runs in the speed of lazy
bytestring test but uses less memory with it.
I've only added to your code more strictness in the recursion, my
version is below.
BTW, I think it is more useful to let user set the chunk size for
reading, so I'd like to see this possibility in the iteratee package.
{-# LANGUAGE RankNTypes, ScopedTypeVariables #-}
import qualified Data.Iteratee.IO.Fd as I
import qualified Data.Iteratee as I
import qualified Data.Iteratee.WrappedByteString as I
import qualified Data.ByteString.Char8 as S
import System.Environment
import System.IO
count :: FilePath -> IO Int
count s = I.fileDriverFd cnt s
cnt :: (Monad m) => I.IterateeG I.WrappedByteString Char m Int
cnt = go 0
where
go n = I.IterateeG $ \ch ->
case ch of
(I.EOF Nothing) -> return $ I.Done n ch
(I.EOF (Just e)) -> return $ I.Cont cnt (Just e)
(I.Chunk (I.WrapBS s)) -> do
let n' = n + S.count '\n' s
return $ n' `seq` I.Cont (go n') Nothing
main :: IO ()
main = do
[f] <- getArgs
print =<< count f
2010/3/17 Gregory Collins
Vasyl Pasternak
writes: Hi Cafe,
Yesterday I played with iteratee package, and wanted to check its performance. I tried to count lines in a file, as Oleg in his famous lazy_vs_correct[1] article. The results somewhat disappointed me.
eris:benchmark greg$ time ./IterateeTest /usr/share/dict/words 234936
real 0m0.027s user 0m0.010s sys 0m0.015s eris:benchmark greg$ time ./ByteStringTest /usr/share/dict/words 234936
real 0m0.024s user 0m0.015s sys 0m0.007s
Note also that the Bytestring I/O functions use a 32KB buffer and the iteratee "enumFd" function uses a 4KB buffer; if the buffers were the same the performance would be comparable. Here is my code:
------------------------------------------------------------------------
{-# LANGUAGE RankNTypes, ScopedTypeVariables #-}
import qualified Data.Iteratee.IO.Fd as I import qualified Data.Iteratee as I import qualified Data.Iteratee.WrappedByteString as I import qualified Data.ByteString.Char8 as S
import System.Environment import System.IO
count :: FilePath -> IO Int count s = I.fileDriverFd cnt s
cnt :: (Monad m) => I.IterateeG I.WrappedByteString Char m Int cnt = go 0 where go n = I.IterateeG $ \ch -> case ch of (I.EOF Nothing) -> return $ I.Done n ch (I.EOF (Just e)) -> return $ I.Cont cnt (Just e) (I.Chunk (I.WrapBS s)) -> do let n' = n + S.count '\n' s return $ I.Cont (go n') Nothing
main :: IO () main = do [f] <- getArgs print =<< count f
-- Gregory Collins

On Wed, Mar 17, 2010 at 5:55 PM, Vasyl Pasternak
BTW, I think it is more useful to let user set the chunk size for reading, so I'd like to see this possibility in the iteratee package.
Indeed, this is also the way I designed my enumerator for usb bulk and interrupt reads: http://code.haskell.org/~basvandijk/code/usb/System/USB/IO/Synchronous/Enume... regards, Bas

As a general style tip, since you're using language extensions anyway
{-# LANGUAGE RankNTypes, ScopedTypeVariables #-}
you might want to add "BangPatterns", so you can put the strictness annotation directly on the argument. Either by doing this: go !n = ... or later on: let !n' = ... return I.Cont ...
cnt = go 0 where go n = I.IterateeG $ \ch -> case ch of (I.EOF Nothing) -> return $ I.Done n ch (I.EOF (Just e)) -> return $ I.Cont cnt (Just e) (I.Chunk (I.WrapBS s)) -> do let n' = n + S.count '\n' s return $ n' `seq` I.Cont (go n') Nothing
participants (6)
-
Bas van Dijk
-
Bayley, Alistair
-
Daniel Fischer
-
Gregory Collins
-
Thomas Schilling
-
Vasyl Pasternak