
Hi Cafe, Yesterday I played with iteratee package, and wanted to check its performance. I tried to count lines in a file, as Oleg in his famous lazy_vs_correct[1] article. The results somewhat disappointed me. The statistics and code follows, but shortly: lazy bytestring is the fastest, iteratee with bytestrings 10 times slower than lazy bytestring. When comparing lazy string and iteratee with [Char], than their results were close, but lazy string reading uses less memory and was a bit faster (20%). I performed test on 250Mb file with 5 millions lines. Now I am figuring out, is these tests are correct and this is ordinary behavior, so iteratee not so fast as I thought, or there is some mistake in my code. [1] http://okmij.org/ftp/Haskell/Iteratee/Lazy-vs-correct.txt --------------------- TIMING RESULTS ---------------------- $ time wc -l 5000000.txt 5000000 5000000.txt real 0m0.314s user 0m0.184s sys 0m0.124s $ time ./bytestring_test 5000000.txt 5000000 real 0m0.817s user 0m0.616s sys 0m0.200s $ time ./bytestring_iteratee_test 5000000.txt real 0m7.801s user 0m7.552s sys 0m0.252s $ time ./string_test 5000000.txt 5000000 real 0m47.427s user 0m46.675s sys 0m0.648s $ time ./string_iteratee_test 5000000.txt 5000000 real 0m59.225s user 0m57.680s sys 0m0.840s -------------------------- RTS INFO -------------------------------- ./bytestring_test 5000000.txt +RTS -sbs.out 807,225,096 bytes allocated in the heap 122,240 bytes copied during GC 59,496 bytes maximum residency (1 sample(s)) 22,424 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 1540 collections, 0 parallel, 0.03s, 0.02s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 0.59s ( 0.79s elapsed) GC time 0.03s ( 0.02s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.62s ( 0.82s elapsed) %GC time 4.5% (2.8% elapsed) Alloc rate 1,372,743,081 bytes per MUT second Productivity 95.5% of total user, 72.1% of total elapsed ----- ./bytestring_iteratee_test 5000000.txt +RTS -siter.out 11,024,100,312 bytes allocated in the heap 893,436,512 bytes copied during GC 95,456 bytes maximum residency (1 sample(s)) 23,216 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 21030 collections, 0 parallel, 2.51s, 2.45s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.02s elapsed) MUT time 6.37s ( 6.66s elapsed) GC time 2.52s ( 2.45s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 8.88s ( 9.12s elapsed) %GC time 28.3% (26.9% elapsed) Alloc rate 1,731,061,437 bytes per MUT second Productivity 71.7% of total user, 69.8% of total elapsed ----- ./string_test 5000000.txt +RTS -sstr.out 38,561,155,264 bytes allocated in the heap 9,862,623,816 bytes copied during GC 223,080 bytes maximum residency (5026 sample(s)) 47,264 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 68525 collections, 0 parallel, 22.50s, 22.51s elapsed Generation 1: 5026 collections, 0 parallel, 1.38s, 1.36s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 22.80s ( 23.55s elapsed) GC time 23.87s ( 23.87s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 46.67s ( 47.42s elapsed) %GC time 51.1% (50.3% elapsed) Alloc rate 1,691,170,222 bytes per MUT second Productivity 48.9% of total user, 48.1% of total elapsed ----- ./string_iteratee_test 5000000.txt +RTS -sstriter.out 40,164,683,672 bytes allocated in the heap 7,108,638,256 bytes copied during GC 212,624 bytes maximum residency (821 sample(s)) 50,264 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 75791 collections, 0 parallel, 33.14s, 33.75s elapsed Generation 1: 821 collections, 0 parallel, 0.56s, 0.63s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 23.99s ( 24.84s elapsed) GC time 33.69s ( 34.38s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 57.68s ( 59.22s elapsed) %GC time 58.4% (58.1% elapsed) Alloc rate 1,674,540,397 bytes per MUT second Productivity 41.6% of total user, 40.5% of total elapsed ------------------ SOURCECODE ----------------------------- $ cat bytestring_test.hs import System.Environment import Control.Monad import qualified Data.ByteString.Lazy.Char8 as B count s = liftM (length . B.lines) (B.readFile s) main = do [f] <- getArgs print =<< count f --------------- $ cat bytestring_iteratee_test.hs import qualified Data.Iteratee.IO as I import qualified Data.Iteratee as I import qualified Data.Iteratee.Char as I import qualified Data.Iteratee.WrappedByteString as I import System.Environment import Control.Monad count s = I.fileDriverRandom (cnt) s cnt :: (Monad m, Functor m) => I.IterateeG I.WrappedByteString Char m Int cnt = I.joinI $ I.enumLines I.length main = do [f] <- getArgs print =<< count f ------------------ $ cat string_test.hs import System.Environment import Control.Monad count s = liftM (length . lines) (readFile s) main = do [f] <- getArgs print =<< count f ---------------------- $ cat string_iteratee_test.hs import qualified Data.Iteratee.IO as I import qualified Data.Iteratee as I import qualified Data.Iteratee.Char as I import qualified Data.Iteratee.WrappedByteString as I import System.Environment import Control.Monad count s = I.fileDriverRandom (cnt) s cnt :: (Monad m, Functor m) => I.IterateeG [] Char m Int cnt = I.joinI $ I.enumLines I.length main = do [f] <- getArgs print =<< count f Best regards, Vasyl