
Am Mittwoch 17 März 2010 17:01:06 schrieb Gregory Collins:
Vasyl Pasternak
writes: Hi Cafe,
Yesterday I played with iteratee package, and wanted to check its performance. I tried to count lines in a file, as Oleg in his famous lazy_vs_correct[1] article. The results somewhat disappointed me.
eris:benchmark greg$ time ./IterateeTest /usr/share/dict/words 234936
real 0m0.027s user 0m0.010s sys 0m0.015s eris:benchmark greg$ time ./ByteStringTest /usr/share/dict/words 234936
real 0m0.024s user 0m0.015s sys 0m0.007s
Note also that the Bytestring I/O functions use a 32KB buffer and the iteratee "enumFd" function uses a 4KB buffer; if the buffers were the same the performance would be comparable. Here is my code:
Which gave not so good performance on a large file, it ate ~530MB memory for a 260MB file with 20915399 lines (after the first run, when the other apps' memory had been swapped out, it wasn't awful anymore, just mediocre). It was, however, a good memory citizen - and pretty fast - after putting a BangPattern on go's argument. Times: wc -l: 0.90s your code with the BangPattern: 1.10s Vasyl's Iteratee code: 7.61s Vasyl's ByteString code: 1.40s Code from http://www.haskell.org/haskellwiki/Wc [1]: 0.76s (all Haskell code compiled with -O2 by ghc-6.12.1). I used iteratee-0.3.4 from hackage. So I'd say, if you know how, Iteratees can already be fast, but it seems to be easier to write slow code than with ByteString. [1] The winner: import qualified Data.ByteString.Lazy.Char8 as L main :: IO () main = L.getContents >>= print . L.count '\n'
------------------------------------------------------------------------
{-# LANGUAGE RankNTypes, ScopedTypeVariables #-}
import qualified Data.Iteratee.IO.Fd as I import qualified Data.Iteratee as I import qualified Data.Iteratee.WrappedByteString as I import qualified Data.ByteString.Char8 as S
import System.Environment import System.IO
count :: FilePath -> IO Int count s = I.fileDriverFd cnt s
cnt :: (Monad m) => I.IterateeG I.WrappedByteString Char m Int cnt = go 0 where go n = I.IterateeG $ \ch -> case ch of (I.EOF Nothing) -> return $ I.Done n ch (I.EOF (Just e)) -> return $ I.Cont cnt (Just e) (I.Chunk (I.WrapBS s)) -> do let n' = n + S.count '\n' s return $ I.Cont (go n') Nothing
main :: IO () main = do [f] <- getArgs print =<< count f