Iteratee performance

17 Mar 2010

      Hi Cafe,

Yesterday I played with iteratee package, and wanted to check its
performance. I tried to count lines in a file, as Oleg in his famous
lazy_vs_correct[1] article. The results somewhat disappointed me.

The statistics and code follows, but shortly: lazy bytestring is the
fastest, iteratee with bytestrings 10 times slower than lazy
bytestring. When comparing lazy string and iteratee with [Char], than
their results were close, but lazy string reading uses less memory and
was a bit faster (20%).

I performed test on 250Mb file with 5 millions lines.

Now I am figuring out, is these tests are correct and this is ordinary
behavior, so iteratee not so fast as I thought, or there is some
mistake in my code.

[1] http://okmij.org/ftp/Haskell/Iteratee/Lazy-vs-correct.txt

--------------------- TIMING RESULTS ----------------------

$ time wc -l 5000000.txt
5000000 5000000.txt

real    0m0.314s
user    0m0.184s
sys     0m0.124s

$ time ./bytestring_test 5000000.txt
5000000

real    0m0.817s
user    0m0.616s
sys     0m0.200s

$ time ./bytestring_iteratee_test 5000000.txt

real    0m7.801s
user    0m7.552s
sys     0m0.252s

$ time ./string_test 5000000.txt
5000000

real    0m47.427s
user    0m46.675s
sys     0m0.648s

$ time ./string_iteratee_test 5000000.txt
5000000

real    0m59.225s
user    0m57.680s
sys     0m0.840s

-------------------------- RTS INFO --------------------------------

./bytestring_test 5000000.txt +RTS -sbs.out
     807,225,096 bytes allocated in the heap
         122,240 bytes copied during GC
          59,496 bytes maximum residency (1 sample(s))
          22,424 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  1540 collections,     0 parallel,  0.03s,  0.02s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.59s  (  0.79s elapsed)
  GC    time    0.03s  (  0.02s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    0.62s  (  0.82s elapsed)

  %GC time       4.5%  (2.8% elapsed)

  Alloc rate    1,372,743,081 bytes per MUT second

  Productivity  95.5% of total user, 72.1% of total elapsed

-----

./bytestring_iteratee_test 5000000.txt +RTS -siter.out
  11,024,100,312 bytes allocated in the heap
     893,436,512 bytes copied during GC
          95,456 bytes maximum residency (1 sample(s))
          23,216 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0: 21030 collections,     0 parallel,  2.51s,  2.45s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.02s elapsed)
  MUT   time    6.37s  (  6.66s elapsed)
  GC    time    2.52s  (  2.45s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    8.88s  (  9.12s elapsed)

  %GC time      28.3%  (26.9% elapsed)

  Alloc rate    1,731,061,437 bytes per MUT second

  Productivity  71.7% of total user, 69.8% of total elapsed

-----

./string_test 5000000.txt +RTS -sstr.out
  38,561,155,264 bytes allocated in the heap
   9,862,623,816 bytes copied during GC
         223,080 bytes maximum residency (5026 sample(s))
          47,264 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0: 68525 collections,     0 parallel, 22.50s, 22.51s elapsed
  Generation 1:  5026 collections,     0 parallel,  1.38s,  1.36s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   22.80s  ( 23.55s elapsed)
  GC    time   23.87s  ( 23.87s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   46.67s  ( 47.42s elapsed)

  %GC time      51.1%  (50.3% elapsed)

  Alloc rate    1,691,170,222 bytes per MUT second

  Productivity  48.9% of total user, 48.1% of total elapsed

-----

./string_iteratee_test 5000000.txt +RTS -sstriter.out
  40,164,683,672 bytes allocated in the heap
   7,108,638,256 bytes copied during GC
         212,624 bytes maximum residency (821 sample(s))
          50,264 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0: 75791 collections,     0 parallel, 33.14s, 33.75s elapsed
  Generation 1:   821 collections,     0 parallel,  0.56s,  0.63s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   23.99s  ( 24.84s elapsed)
  GC    time   33.69s  ( 34.38s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   57.68s  ( 59.22s elapsed)

  %GC time      58.4%  (58.1% elapsed)

  Alloc rate    1,674,540,397 bytes per MUT second

  Productivity  41.6% of total user, 40.5% of total elapsed

------------------ SOURCECODE -----------------------------

$ cat bytestring_test.hs
import System.Environment
import Control.Monad

import qualified Data.ByteString.Lazy.Char8 as B

count s = liftM (length . B.lines) (B.readFile s)

main = do
  [f] <- getArgs
  print =<< count f

---------------

$ cat bytestring_iteratee_test.hs

import qualified Data.Iteratee.IO as I
import qualified Data.Iteratee as I
import qualified Data.Iteratee.Char as I
import qualified Data.Iteratee.WrappedByteString as I

import System.Environment
import Control.Monad

count s = I.fileDriverRandom (cnt) s

cnt :: (Monad m, Functor m) => I.IterateeG I.WrappedByteString Char m Int
cnt = I.joinI $ I.enumLines I.length

main = do
  [f] <- getArgs
  print =<< count f

------------------

$ cat string_test.hs
import System.Environment
import Control.Monad

count s = liftM (length . lines) (readFile s)

main = do
  [f] <- getArgs
  print =<< count f

----------------------

$ cat string_iteratee_test.hs

import qualified Data.Iteratee.IO as I
import qualified Data.Iteratee as I
import qualified Data.Iteratee.Char as I
import qualified Data.Iteratee.WrappedByteString as I

import System.Environment
import Control.Monad

count s = I.fileDriverRandom (cnt) s

cnt :: (Monad m, Functor m) => I.IterateeG [] Char m Int
cnt = I.joinI $ I.enumLines I.length

main = do
  [f] <- getArgs
  print =<< count f

Best regards,
Vasyl

Vasyl Pasternak

Bayley, Alistair

Gregory Collins

Daniel Fischer

Vasyl Pasternak

Bas van Dijk

Thomas Schilling

tags

participants (6)