
On 03/19/2013 10:32 PM, Don Stewart wrote:
Oh, I forgot the technique of inlining the lazy bytestring chunks, and processing each chunk seperately.
$ time ./fast 4166680 ./fast 1.25s user 0.07s system 99% cpu 1.325 total
Essentially inline Lazy.foldlChunks and specializes is (the inliner should really get that). And now we have a nice unboxed inner loop, which llvm might spot:
$ ghc -O2 -funbox-strict-fields fast.hs --make -fllvm $ time ./fast 4166680 ./fast 1.07s user 0.06s system 98% cpu *1.146 total*
So about 8x faster. Waiting for some non-lazy bytestring benchmarks... :)
Thanks Don, but after some investigation I came to conclusion that problem is in State monad {-# LANGUAGE BangPatterns #-} import Control.Monad.State.Strict data S6 = S6 !Int !Int main_6 = do let r = evalState go (S6 10000 0) print r where go = do (S6 i a) <- get if (i == 0) then return a else (put (S6 (i - 1) (a + i))) >> go main_7 = do let r = go (S6 10000 0) print r where go (S6 i a) | i == 0 = a | otherwise = go $ S6 (i - 1) (a + i) main = main_6 main_6 doing constant allocations while main_7 run in constant space. Can you suggest something that improve situation? I don't want to manually unfold all my code that I want to be fast :(.