
Just for fun. Here's some improvements. about 6x faster. I'd be interested to see what io-streams could do on this. Using a 250M test file. -- strict state monad and bang patterns on the uncons and accumulator argument: $ time ./A 4166680 ./A 8.42s user 0.57s system 99% cpu 9.037 total -- just write a loop $ time ./A 4166680 ./A 3.84s user 0.26s system 99% cpu 4.121 total -- switch to Int $ time ./A 4166680 ./A 1.89s user 0.23s system 99% cpu 2.134 total -- custom isSpace function $ time ./A 4166680 ./A 1.56s user 0.24s system 99% cpu 1.808 total -- mmap IO $ time ./A 4166680 ./A 1.54s user 0.09s system 99% cpu 1.636 total Here's the final program: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString as S import qualified Data.ByteString.Lazy.Char8 as L import System.IO.Posix.MMap.Lazy main = do f <- unsafeMMapFile "test.txt" print $ go 0 f where go :: Int -> L.ByteString -> Int go !a !s = case L.uncons s of Nothing -> a Just (x,xs) | isSpaceChar8 x -> go (a+1) xs | otherwise -> go a xs isSpaceChar8 c = c == '\n' || c == ' ' {-# INLINE isSpaceChar8 #-} On Mon, Mar 18, 2013 at 8:53 AM, Konstantin Litvinenko < to.darkangel@gmail.com> wrote:
Hi All!
I tune my toy project for performance and hit the wall on simple, in imperative world, task. Here is the code that model what I'm trying to achieve
import qualified Data.ByteString.Lazy as L import Data.Word8(isSpace) import Data.Word import Control.Monad.State
type Stream = State L.ByteString
get_byte :: Stream (Maybe Word8) get_byte = do s <- get case L.uncons s of Nothing -> return Nothing Just (x, xs) -> put xs >> return (Just x)
main = do f <- L.readFile "test.txt" let r = evalState count_spaces f print r where count_spaces = go 0 where go a = do x <- get_byte case x of Just x' -> if isSpace x' then go (a + 1) else go a Nothing -> return a
It takes the file and count spaces, in imperative way, consuming bytes one by one. The problem is: How to rewrite this to get rid of constant allocation of state but still working with stream of bytes? I can rewrite this as one-liner L.foldl, but that doesn't help me in any way to optimize my toy project where all algorithms build upon consuming stream of bytes.
PS. My main lang is C++ over 10 years and I only learn Haskell :)
______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe