
Hi All! I tune my toy project for performance and hit the wall on simple, in imperative world, task. Here is the code that model what I'm trying to achieve import qualified Data.ByteString.Lazy as L import Data.Word8(isSpace) import Data.Word import Control.Monad.State type Stream = State L.ByteString get_byte :: Stream (Maybe Word8) get_byte = do s <- get case L.uncons s of Nothing -> return Nothing Just (x, xs) -> put xs >> return (Just x) main = do f <- L.readFile "test.txt" let r = evalState count_spaces f print r where count_spaces = go 0 where go a = do x <- get_byte case x of Just x' -> if isSpace x' then go (a + 1) else go a Nothing -> return a It takes the file and count spaces, in imperative way, consuming bytes one by one. The problem is: How to rewrite this to get rid of constant allocation of state but still working with stream of bytes? I can rewrite this as one-liner L.foldl, but that doesn't help me in any way to optimize my toy project where all algorithms build upon consuming stream of bytes. PS. My main lang is C++ over 10 years and I only learn Haskell :)

Put a bang pattern on your accumulator in "go". Since the value is not demanded until the end of the program, you're actually just building up a huge space leak there. Secondly, unconsing from the lazy bytestring will cause a lot of allocation churn in the garbage collector -- each byte read in the input forces the creation of a new "L.ByteString", which is many times larger. Also please consider trying the "io-streams" library that I wrote ( http://hackage.haskell.org/package/io-streams). It provides primitives for streaming IO in "basic Haskell" style. To provide a Word8 stream (which is probably a bad idea performance-wise) it would be most efficient allocation-wise to implement a mutable index cursor (i.e. IORef Int) that pointed to your current position within the ByteString chunk, other strategies will probably allocate too much. G On Mon, Mar 18, 2013 at 9:53 AM, Konstantin Litvinenko < to.darkangel@gmail.com> wrote:
Hi All!
I tune my toy project for performance and hit the wall on simple, in imperative world, task. Here is the code that model what I'm trying to achieve
import qualified Data.ByteString.Lazy as L import Data.Word8(isSpace) import Data.Word import Control.Monad.State
type Stream = State L.ByteString
get_byte :: Stream (Maybe Word8) get_byte = do s <- get case L.uncons s of Nothing -> return Nothing Just (x, xs) -> put xs >> return (Just x)
main = do f <- L.readFile "test.txt" let r = evalState count_spaces f print r where count_spaces = go 0 where go a = do x <- get_byte case x of Just x' -> if isSpace x' then go (a + 1) else go a Nothing -> return a
It takes the file and count spaces, in imperative way, consuming bytes one by one. The problem is: How to rewrite this to get rid of constant allocation of state but still working with stream of bytes? I can rewrite this as one-liner L.foldl, but that doesn't help me in any way to optimize my toy project where all algorithms build upon consuming stream of bytes.
PS. My main lang is C++ over 10 years and I only learn Haskell :)
______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe
--
Gregory Collins

On 03/18/2013 02:14 PM, Gregory Collins wrote:
Put a bang pattern on your accumulator in "go". Since the value is not demanded until the end of the program, you're actually just building up a huge space leak there.
Fixed that
Secondly, unconsing from the lazy bytestring will cause a lot of allocation churn in the garbage collector -- each byte read in the input forces the creation of a new "L.ByteString", which is many times larger.
Nope. L.ByteString is created along with strict ByteString but content not copied. And, in fact, that not a problem. The problem is that GHC unable to optimize constantly changing state in State monad. I don't know is it posible or not and if it is than what should I do to allow such optimization. import Control.Monad.State.Strict data S6 = S6 Int Int main_6 = do let r = evalState go (S6 10000 0) print r where go = do (S6 i a) <- get if (i == 0) then return a else put (S6 (i - 1) (a + i)) >> go main_7 = do let r = go (S6 10000 0) print r where go (S6 i a) | i == 0 = a | otherwise = go $ S6 (i - 1) (a + i) main = main_7 If I run main_6 I get constant allocations. If I run main_7 I get no allocations. Does anybody know how to overcome this inefficiency?

Just for fun. Here's some improvements. about 6x faster. I'd be interested to see what io-streams could do on this. Using a 250M test file. -- strict state monad and bang patterns on the uncons and accumulator argument: $ time ./A 4166680 ./A 8.42s user 0.57s system 99% cpu 9.037 total -- just write a loop $ time ./A 4166680 ./A 3.84s user 0.26s system 99% cpu 4.121 total -- switch to Int $ time ./A 4166680 ./A 1.89s user 0.23s system 99% cpu 2.134 total -- custom isSpace function $ time ./A 4166680 ./A 1.56s user 0.24s system 99% cpu 1.808 total -- mmap IO $ time ./A 4166680 ./A 1.54s user 0.09s system 99% cpu 1.636 total Here's the final program: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString as S import qualified Data.ByteString.Lazy.Char8 as L import System.IO.Posix.MMap.Lazy main = do f <- unsafeMMapFile "test.txt" print $ go 0 f where go :: Int -> L.ByteString -> Int go !a !s = case L.uncons s of Nothing -> a Just (x,xs) | isSpaceChar8 x -> go (a+1) xs | otherwise -> go a xs isSpaceChar8 c = c == '\n' || c == ' ' {-# INLINE isSpaceChar8 #-} On Mon, Mar 18, 2013 at 8:53 AM, Konstantin Litvinenko < to.darkangel@gmail.com> wrote:
Hi All!
I tune my toy project for performance and hit the wall on simple, in imperative world, task. Here is the code that model what I'm trying to achieve
import qualified Data.ByteString.Lazy as L import Data.Word8(isSpace) import Data.Word import Control.Monad.State
type Stream = State L.ByteString
get_byte :: Stream (Maybe Word8) get_byte = do s <- get case L.uncons s of Nothing -> return Nothing Just (x, xs) -> put xs >> return (Just x)
main = do f <- L.readFile "test.txt" let r = evalState count_spaces f print r where count_spaces = go 0 where go a = do x <- get_byte case x of Just x' -> if isSpace x' then go (a + 1) else go a Nothing -> return a
It takes the file and count spaces, in imperative way, consuming bytes one by one. The problem is: How to rewrite this to get rid of constant allocation of state but still working with stream of bytes? I can rewrite this as one-liner L.foldl, but that doesn't help me in any way to optimize my toy project where all algorithms build upon consuming stream of bytes.
PS. My main lang is C++ over 10 years and I only learn Haskell :)
______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe

Oh, I forgot the technique of inlining the lazy bytestring chunks, and
processing each chunk seperately.
$ time ./fast
4166680
./fast 1.25s user 0.07s system 99% cpu 1.325 total
Essentially inline Lazy.foldlChunks and specializes is (the inliner should
really get that).
And now we have a nice unboxed inner loop, which llvm might spot:
$ ghc -O2 -funbox-strict-fields fast.hs --make -fllvm
$ time ./fast
4166680
./fast 1.07s user 0.06s system 98% cpu *1.146 total*
So about 8x faster. Waiting for some non-lazy bytestring benchmarks... :)
{-# LANGUAGE BangPatterns #-}
import Data.ByteString.Internal
import Data.ByteString.Unsafe
import qualified Data.ByteString.Char8 as S
import qualified Data.ByteString.Lazy.Char8 as L
import qualified Data.ByteString.Lazy.Internal as L
import System.IO.Posix.MMap.Lazy
main = do
f <- unsafeMMapFile "test.txt"
print . new 0 $ L.toChunks f
new :: Int -> [ByteString] -> Int
new i [] = i
new i (x:xs) = new (add i x) xs
-- jump into the fast path
{-# INLINE add #-}
add :: Int -> ByteString -> Int
add !i !s | S.null s = i
| isSpace' x = add (i+1) xs
| otherwise = add i xs
where T x xs = uncons s
data T = T !Char ByteString
uncons s = T (w2c (unsafeHead s)) (unsafeTail s)
isSpace' c = c == '\n' || c == ' '
{-# INLINE isSpace' #-}
On Tue, Mar 19, 2013 at 7:36 PM, Don Stewart
Just for fun. Here's some improvements. about 6x faster. I'd be interested to see what io-streams could do on this.
Using a 250M test file.
-- strict state monad and bang patterns on the uncons and accumulator argument:
$ time ./A 4166680 ./A 8.42s user 0.57s system 99% cpu 9.037 total
-- just write a loop
$ time ./A 4166680 ./A 3.84s user 0.26s system 99% cpu 4.121 total
-- switch to Int
$ time ./A 4166680 ./A 1.89s user 0.23s system 99% cpu 2.134 total
-- custom isSpace function
$ time ./A 4166680 ./A 1.56s user 0.24s system 99% cpu 1.808 total
-- mmap IO
$ time ./A 4166680 ./A 1.54s user 0.09s system 99% cpu 1.636 total
Here's the final program:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString as S import qualified Data.ByteString.Lazy.Char8 as L import System.IO.Posix.MMap.Lazy
main = do f <- unsafeMMapFile "test.txt" print $ go 0 f where go :: Int -> L.ByteString -> Int go !a !s = case L.uncons s of Nothing -> a Just (x,xs) | isSpaceChar8 x -> go (a+1) xs | otherwise -> go a xs
isSpaceChar8 c = c == '\n' || c == ' ' {-# INLINE isSpaceChar8 #-}
On Mon, Mar 18, 2013 at 8:53 AM, Konstantin Litvinenko < to.darkangel@gmail.com> wrote:
Hi All!
I tune my toy project for performance and hit the wall on simple, in imperative world, task. Here is the code that model what I'm trying to achieve
import qualified Data.ByteString.Lazy as L import Data.Word8(isSpace) import Data.Word import Control.Monad.State
type Stream = State L.ByteString
get_byte :: Stream (Maybe Word8) get_byte = do s <- get case L.uncons s of Nothing -> return Nothing Just (x, xs) -> put xs >> return (Just x)
main = do f <- L.readFile "test.txt" let r = evalState count_spaces f print r where count_spaces = go 0 where go a = do x <- get_byte case x of Just x' -> if isSpace x' then go (a + 1) else go a Nothing -> return a
It takes the file and count spaces, in imperative way, consuming bytes one by one. The problem is: How to rewrite this to get rid of constant allocation of state but still working with stream of bytes? I can rewrite this as one-liner L.foldl, but that doesn't help me in any way to optimize my toy project where all algorithms build upon consuming stream of bytes.
PS. My main lang is C++ over 10 years and I only learn Haskell :)
______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe

On 03/19/2013 10:32 PM, Don Stewart wrote:
Oh, I forgot the technique of inlining the lazy bytestring chunks, and processing each chunk seperately.
$ time ./fast 4166680 ./fast 1.25s user 0.07s system 99% cpu 1.325 total
Essentially inline Lazy.foldlChunks and specializes is (the inliner should really get that). And now we have a nice unboxed inner loop, which llvm might spot:
$ ghc -O2 -funbox-strict-fields fast.hs --make -fllvm $ time ./fast 4166680 ./fast 1.07s user 0.06s system 98% cpu *1.146 total*
So about 8x faster. Waiting for some non-lazy bytestring benchmarks... :)
Thanks Don, but after some investigation I came to conclusion that problem is in State monad {-# LANGUAGE BangPatterns #-} import Control.Monad.State.Strict data S6 = S6 !Int !Int main_6 = do let r = evalState go (S6 10000 0) print r where go = do (S6 i a) <- get if (i == 0) then return a else (put (S6 (i - 1) (a + i))) >> go main_7 = do let r = go (S6 10000 0) print r where go (S6 i a) | i == 0 = a | otherwise = go $ S6 (i - 1) (a + i) main = main_6 main_6 doing constant allocations while main_7 run in constant space. Can you suggest something that improve situation? I don't want to manually unfold all my code that I want to be fast :(.

On 03/19/2013 10:49 PM, Konstantin Litvinenko wrote:
{-# LANGUAGE BangPatterns #-}
import Control.Monad.State.Strict
data S6 = S6 !Int !Int
main_6 = do let r = evalState go (S6 10000 0) print r where go = do (S6 i a) <- get if (i == 0) then return a else (put (S6 (i - 1) (a + i))) >> go
main_7 = do let r = go (S6 10000 0) print r where go (S6 i a) | i == 0 = a | otherwise = go $ S6 (i - 1) (a + i)
main = main_6
main_6 doing constant allocations while main_7 run in constant space. Can you suggest something that improve situation? I don't want to manually unfold all my code that I want to be fast :(.
Correction - they both run in constant space, that's not a problem. The problem is main_6 doing constant allocation/destroying and main_7 doesn't.

To: haskell-cafe@haskell.org From: to.darkangel@gmail.com Date: Tue, 19 Mar 2013 23:27:09 +0200 Subject: Re: [Haskell-cafe] Streaming bytes and performance
On 03/19/2013 10:49 PM, Konstantin Litvinenko wrote:
{-# LANGUAGE BangPatterns #-}
import Control.Monad.State.Strict
data S6 = S6 !Int !Int
main_6 = do let r = evalState go (S6 10000 0) print r where go = do (S6 i a) <- get if (i == 0) then return a else (put (S6 (i - 1) (a + i))) >> go
main_7 = do let r = go (S6 10000 0) print r where go (S6 i a) | i == 0 = a | otherwise = go $ S6 (i - 1) (a + i)
main = main_6
main_6 doing constant allocations while main_7 run in constant space. Can you suggest something that improve situation? I don't want to manually unfold all my code that I want to be fast :(. Your problem is that main_6 thunks 'i' and 'a' .If you write (S6 !i !a) <- getthan there is no problem any more...
Correction - they both run in constant space, that's not a problem. The problem is main_6 doing constant allocation/destroying and main_7 doesn't. No main_6 does not runs in constant space if you dont use bang patterns...

Are you sure? I use ghc 7.6.2 (compiled with -O2) and without bang patternsfor 1million iterations it blows stack space.With bang patterns it runs in constant space , same as other version? bmaxa@maxa:~/haskell$ ./state +RTS -s500000500000 52,080 bytes allocated in the heap 3,512 bytes copied during GC 44,416 bytes maximum residency (1 sample(s)) 17,024 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 0 colls, 0 par 0.00s 0.00s 0.0000s 0.0000s Gen 1 1 colls, 0 par 0.00s 0.00s 0.0001s 0.0001s INIT time 0.00s ( 0.00s elapsed) MUT time 0.00s ( 0.00s elapsed) GC time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.00s ( 0.00s elapsed) %GC time 0.0% (6.2% elapsed) Alloc rate 0 bytes per MUT second Productivity 100.0% of total user, 0.0% of total elapsed
Date: Wed, 20 Mar 2013 08:04:01 +0200 From: to.darkangel@gmail.com To: bmaxa@hotmail.com CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Streaming bytes and performance
On 03/20/2013 12:47 AM, Branimir Maksimovic wrote:
Your problem is that main_6 thunks 'i' and 'a' . If you write (S6 !i !a) <- get than there is no problem any more...
Nope :( Unfortunately that doesn't change anything. Still allocating...

On 20 March 2013 11:41, Konstantin Litvinenko
On 03/20/2013 11:17 AM, Branimir Maksimovic wrote:
Are you sure? I use ghc 7.6.2
Huh, I use 7.4.2, and if 7.6.2 can handle this I will try to switch. Not sure how to do that on ubuntu 12.10...
I always install ghcs under my home directory: wget http://www.haskell.org/ghc/dist/7.6.2/ghc-7.6.2-x86_64-unknown-linux.tar.bz2 tar -xf ghc-7.6.2-x86_64-unknown-linux.tar.bz2 cd ghc-7.6.2 configure --prefix=$HOME/ghcs/7.6.2 make install Then put $HOME/ghcs/7.6.2/bin in front of your $PATH. You could also use: hsenv --ghc=ghc-7.6.2-x86_64-unknown-linux.tar.bz2 for this: http://hackage.haskell.org/package/hsenv My colleague Jason just made a nice improvement: https://github.com/tmhedberg/hsenv/pull/22 which allows you to do: hsenv --ghc=7.6.2 which will automatically download the right ghc for your platform and install it in a new fresh environment isolated from the rest of your system. Bas

On 03/20/2013 11:17 AM, Branimir Maksimovic wrote:
Are you sure? I use ghc 7.6.2 (compiled with -O2) and without bang patterns for 1million iterations it blows stack space. With bang patterns it runs in constant space , same as other version?
Okay, I have found the root of allocation problem. It is not because of 7.4.2. If I use -auto-all it somehow change code generation and start allocating. If I remove -auto-all from command line than no allocation occurs. That really weird because now I don't know how to profile and get meaningful results :(

On Tue, 2013-03-19 at 20:32 +0000, Don Stewart wrote:
Oh, I forgot the technique of inlining the lazy bytestring chunks, and processing each chunk seperately.
$ time ./fast 4166680 ./fast 1.25s user 0.07s system 99% cpu 1.325 total
Essentially inline Lazy.foldlChunks and specializes is (the inliner should really get that). And now we have a nice unboxed inner loop, which llvm might spot:
$ ghc -O2 -funbox-strict-fields fast.hs --make -fllvm $ time ./fast 4166680 ./fast 1.07s user 0.06s system 98% cpu *1.146 total*
So about 8x faster. Waiting for some non-lazy bytestring benchmarks... :)
You could try something like this using Conduit: {-# LANGUAGE BangPatterns #-} module Main (main) where import Data.Conduit import qualified Data.Conduit.List as L import qualified Data.Conduit.Binary as B import qualified Data.ByteString.Char8 as BS8 main :: IO () main = print =<< runResourceT ( B.sourceFile filename $$ L.fold (\(!a) (!b) -> a + BS8.count ' ' b) (0 :: Int)) where filename = ... Nicolas
{-# LANGUAGE BangPatterns #-}
import Data.ByteString.Internal import Data.ByteString.Unsafe import qualified Data.ByteString.Char8 as S import qualified Data.ByteString.Lazy.Char8 as L import qualified Data.ByteString.Lazy.Internal as L import System.IO.Posix.MMap.Lazy
main = do f <- unsafeMMapFile "test.txt" print . new 0 $ L.toChunks f
new :: Int -> [ByteString] -> Int new i [] = i new i (x:xs) = new (add i x) xs
-- jump into the fast path {-# INLINE add #-} add :: Int -> ByteString -> Int add !i !s | S.null s = i | isSpace' x = add (i+1) xs | otherwise = add i xs where T x xs = uncons s
data T = T !Char ByteString
uncons s = T (w2c (unsafeHead s)) (unsafeTail s)
isSpace' c = c == '\n' || c == ' ' {-# INLINE isSpace' #-}
On Tue, Mar 19, 2013 at 7:36 PM, Don Stewart
wrote: Just for fun. Here's some improvements. about 6x faster. I'd be interested to see what io-streams could do on this.
Using a 250M test file.
-- strict state monad and bang patterns on the uncons and accumulator argument:
$ time ./A 4166680 ./A 8.42s user 0.57s system 99% cpu 9.037 total
-- just write a loop
$ time ./A 4166680 ./A 3.84s user 0.26s system 99% cpu 4.121 total
-- switch to Int
$ time ./A 4166680 ./A 1.89s user 0.23s system 99% cpu 2.134 total
-- custom isSpace function
$ time ./A 4166680 ./A 1.56s user 0.24s system 99% cpu 1.808 total
-- mmap IO
$ time ./A 4166680 ./A 1.54s user 0.09s system 99% cpu 1.636 total
Here's the final program:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString as S import qualified Data.ByteString.Lazy.Char8 as L import System.IO.Posix.MMap.Lazy
main = do f <- unsafeMMapFile "test.txt" print $ go 0 f where go :: Int -> L.ByteString -> Int go !a !s = case L.uncons s of Nothing -> a Just (x,xs) | isSpaceChar8 x -> go (a+1) xs | otherwise -> go a xs
isSpaceChar8 c = c == '\n' || c == ' ' {-# INLINE isSpaceChar8 #-}
On Mon, Mar 18, 2013 at 8:53 AM, Konstantin Litvinenko < to.darkangel@gmail.com> wrote:
Hi All!
I tune my toy project for performance and hit the wall on simple, in imperative world, task. Here is the code that model what I'm trying to achieve
import qualified Data.ByteString.Lazy as L import Data.Word8(isSpace) import Data.Word import Control.Monad.State
type Stream = State L.ByteString
get_byte :: Stream (Maybe Word8) get_byte = do s <- get case L.uncons s of Nothing -> return Nothing Just (x, xs) -> put xs >> return (Just x)
main = do f <- L.readFile "test.txt" let r = evalState count_spaces f print r where count_spaces = go 0 where go a = do x <- get_byte case x of Just x' -> if isSpace x' then go (a + 1) else go a Nothing -> return a
It takes the file and count spaces, in imperative way, consuming bytes one by one. The problem is: How to rewrite this to get rid of constant allocation of state but still working with stream of bytes? I can rewrite this as one-liner L.foldl, but that doesn't help me in any way to optimize my toy project where all algorithms build upon consuming stream of bytes.
PS. My main lang is C++ over 10 years and I only learn Haskell :)
______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 03/19/2013 10:53 PM, Nicolas Trangez wrote:
On Tue, 2013-03-19 at 20:32 +0000, Don Stewart wrote:
So about 8x faster. Waiting for some non-lazy bytestring benchmarks... :)
You could try something like this using Conduit:
{-# LANGUAGE BangPatterns #-} module Main (main) where
import Data.Conduit import qualified Data.Conduit.List as L import qualified Data.Conduit.Binary as B import qualified Data.ByteString.Char8 as BS8
main :: IO () main = print =<< runResourceT ( B.sourceFile filename $$ L.fold (\(!a) (!b) -> a + BS8.count ' ' b) (0 :: Int)) where filename = ...
Please stops counting spaces! :) It was a MODEL that demonstrates constant allocation of state when I used State monad. That's the *problem*. I mention in my first email that I do know how to count spaces using one-line L.foldl with no allocations at all :).

Don Stewart
Here's the final program: [...]
Here is a version of the program that is just as fast:
import Prelude hiding ( getContents, foldl )
import Data.ByteString.Char8
countSpace :: Int -> Char -> Int
countSpace i c | c == ' ' || c == '\n' = i + 1
| otherwise = i
main :: IO ()
main = getContents >>= print . foldl countSpace 0
Generally speaking, I/O performance is not about fancy low-level system
features, it's about having a proper evaluation order:
| $ ghc --make -O2 -funbox-strict-fields test1 && time ./test1
| 37627064
|
| real 0m0.381s
| user 0m0.356s
| sys 0m0.023s
Versus:
| $ ghc --make -O2 -funbox-strict-fields test2 && time ./test2

This isn't a valid entry -- it uses strict IO (so allocates O(n) space) and
reads from standard input, which pretty much swamps the interesting
constant factors with buffered IO overhead.
Compare your program (made lazy) on lazy bytestrings using file IO:
import Prelude hiding ( readFile, foldl )
import Data.ByteString.Lazy.Char8
countSpace :: Int -> Char -> Int
countSpace i c | c == ' ' || c == '\n' = i + 1
| otherwise = i
main :: IO ()
main = readFile "test.txt" >>= print . foldl countSpace 0
Against my earlier optimized one (that manually specializes and does other
tricks).
$ time ./C
4166680
./C 1.49s user 0.42s system 82% cpu 2.326 total
$ time ./fast
4166680
./fast 1.05s user 0.11s system 96% cpu 1.201 total
The optimized one is twice as fast. You can write the same program on lists
, and it also runs in constant space but completes 32s instead of 1.3
Constant factors matter.
On Tue, Mar 19, 2013 at 9:03 PM, Peter Simons
Don Stewart
writes: Here's the final program: [...]
Here is a version of the program that is just as fast:
import Prelude hiding ( getContents, foldl ) import Data.ByteString.Char8
countSpace :: Int -> Char -> Int countSpace i c | c == ' ' || c == '\n' = i + 1 | otherwise = i
main :: IO () main = getContents >>= print . foldl countSpace 0
Generally speaking, I/O performance is not about fancy low-level system features, it's about having a proper evaluation order:
| $ ghc --make -O2 -funbox-strict-fields test1 && time ./test1 | 37627064 | | real 0m0.381s | user 0m0.356s | sys 0m0.023s
Versus:
| $ ghc --make -O2 -funbox-strict-fields test2 && time ./test2
Using this input file stored in /dev/shm:
| $ ls -l test.txt | -rw-r--r-- 1 simons users 208745650 Mar 19 21:40 test.txt
Take care, Peter
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi Don,
Compare your program (made lazy) on lazy bytestrings using file IO: [...]
if I make those changes, the program runs even faster than before: module Main ( main ) where import Prelude hiding ( foldl, readFile ) import Data.ByteString.Lazy.Char8 countSpace :: Int -> Char -> Int countSpace i c | c == ' ' || c == '\n' = i + 1 | otherwise = i main :: IO () main = readFile "test.txt" >>= print . foldl countSpace 0 This gives | $ ghc --make -O2 -funbox-strict-fields test1 && time ./test1 | 37627064 | | real 0m0.375s | user 0m0.346s | sys 0m0.028s versus: | $ ghc --make -O2 -funbox-strict-fields test2 && time ./test2 | 37627064 | | real 0m0.324s | user 0m0.299s | sys 0m0.024s Whether getFile or getContents is used doesn't seem to make difference. Take care, Peter

Oh I see what you're doing ... "Using this input file stored in /dev/shm"
So not measuring the IO performance at all. :)
On Mar 19, 2013 9:27 PM, "Peter Simons"
Hi Don,
Compare your program (made lazy) on lazy bytestrings using file IO: [...]
if I make those changes, the program runs even faster than before:
module Main ( main ) where
import Prelude hiding ( foldl, readFile ) import Data.ByteString.Lazy.Char8
countSpace :: Int -> Char -> Int countSpace i c | c == ' ' || c == '\n' = i + 1 | otherwise = i
main :: IO () main = readFile "test.txt" >>= print . foldl countSpace 0
This gives
| $ ghc --make -O2 -funbox-strict-fields test1 && time ./test1 | 37627064 | | real 0m0.375s | user 0m0.346s | sys 0m0.028s
versus:
| $ ghc --make -O2 -funbox-strict-fields test2 && time ./test2 | 37627064 | | real 0m0.324s | user 0m0.299s | sys 0m0.024s
Whether getFile or getContents is used doesn't seem to make difference.
Take care, Peter

Hi Don,
"Using this input file stored in /dev/shm"
So not measuring the IO performance at all. :)
of course the program measures I/O performance. It just doesn't measure the speed of the disk. Anyway, a highly optimized benchmark such as the one you posted is eventually going to beat one that's not as highly optimized. I think no-one disputes that fact. I was merely trying to point out that a program which encodes its evaluation order properly is going to be reasonably fast without any further optimizations. Take care, Peter

I guess the optimizations that went into making lazy bytestring IO fast (on
disks) are increasingly irrelevant as SSDs take over.
On Mar 19, 2013 9:49 PM, "Peter Simons"
Hi Don,
"Using this input file stored in /dev/shm"
So not measuring the IO performance at all. :)
of course the program measures I/O performance. It just doesn't measure the speed of the disk.
Anyway, a highly optimized benchmark such as the one you posted is eventually going to beat one that's not as highly optimized. I think no-one disputes that fact.
I was merely trying to point out that a program which encodes its evaluation order properly is going to be reasonably fast without any further optimizations.
Take care, Peter
participants (7)
-
Bas van Dijk
-
Branimir Maksimovic
-
Don Stewart
-
Gregory Collins
-
Konstantin Litvinenko
-
Nicolas Trangez
-
Peter Simons