
On Sat, 2007-12-15 at 14:34 +0000, Duncan Coutts wrote:
Ok, I presume this is a guessing game and we're supposed to just look at the code without running and timing them.
Precisely :)
All they do is read from stdin and count the number of spaces they see. There are two that use strict bytestrings, two that use lazy bytestrings, and two that use the standard Haskell strings. Three use a recursive function with an accumulator parameter and three use a foldl with a lambda function.
Say the fastest one takes the time 1. How much time will the others take?
And how about memory? How much memory do you think they require? Let's say we feed a 150MB(*) file into each of them, how many megabytes do you think they end up using (as seen from the OS, not in terms of how big the live heap is)?
I'm going to post full benchmarks + analysis on Wednesday.
Right'o. I'll have a go. Lets see if I can't embarrass myself with being completely inaccurate.
Thanks for biting! You were, thankfully, only almost completely inaccurate ;)
PS: For extra credit, what do you think is the peak memory use for this program when given an input file of 150MB?
Hmm. So that should work in constant memory, a few 64 chunks at once. I'd expect this to be pretty fast.
You are right about the speed. Can you guess a number in kilobytes?
============================== hs/space-bs-c8-acc-1.hs: {-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Char8 as B
cnt :: Int -> B.ByteString -> Int cnt !acc bs = if B.null bs then acc else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs)
main = do s <- B.getContents print (cnt 0 s)
This uses strict bytestrings so will use at least 150Mb and that'll make it a good deal slower. In fact it'll be worse than that since getContents does not know in advance how big the input will be so it has to play the doubling and copying game. So it'll end up copying all the data roughly twice. cnt is strict and tail recursive so that shouldn't be any problem, though it's probably not as fast as the first length . filter since head, tail, null all have to do bounds checks.
You are right about the memory. It is actually slightly faster than the "extra credit" (length/filter combination) above.
============================== hs/space-bslc8-acc-1.hs: {-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
cnt :: Int -> B.ByteString -> Int cnt !acc bs = if B.null bs then acc else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs)
main = do s <- B.getContents print (cnt 0 s)
For the same reason as above, I'd expect this cnt to be slower than B.length . B.filter (== ' ')
It is slower but not for the same reason as above.
============================== hs/space-xxxxx-acc-1.hs: {-# LANGUAGE BangPatterns #-}
cnt :: Int -> String -> Int cnt !acc bs = if null bs then acc else cnt (if head bs == ' ' then acc+1 else acc) (tail bs)
main = do s <- getContents print (cnt 0 s)
Lazy, so constant memory use, but much higher constant factors due to using String.
Spot on.
============================== hs/space-bs-c8-foldlx-1.hs: {-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Char8 as B
cnt :: B.ByteString -> Int cnt bs = B.foldl' (\sum c -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- B.getContents print (cnt s)
This is of course still strict so that's going to make the reading slow.
Nope.
This is a manually fused B.length . B.filter (== ' ') which hopefully is the same speed as the automatically fused one if the fusion is working ok. If not, then the B.length . B.filter (== ' ') will be doing a extra copy, and memory writes are expensive.
============================== hs/space-bslc8-foldlx-1.hs: {-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
cnt :: B.ByteString -> Int cnt bs = B.foldl' (\sum c -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- B.getContents print (cnt s)
As above but now in constant memory space.
Nope.
============================== hs/space-xxxxx-foldl.hs: {-# LANGUAGE BangPatterns #-}
cnt :: String -> Int cnt bs = foldl (\sum c -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- getContents print (cnt s)
Oh, no! not foldl that's a killer.
You think it's worse than the program just above?
Ok, so best way to summarise I think is to organise by data type since I think that'll dominate.
So I think the lazy bytestring versions will be fastest due to having the best memory access patterns and doing the least copying. I think the foldl's will be faster than the explicit accumulators due to having fewer bounds checks.
space-bslc8-foldlx-1 space-bslc8-acc-1
space-bs-c8-foldlx-1 space-bs-c8-acc-1
space-xxxxx-acc-1 space-xxxxx-foldl
I'll try guessing at some ratios:
1.0 space-bslc8-foldlx-1 1.1 space-bslc8-acc-1
2.0 space-bs-c8-foldlx-1 2.1 space-bs-c8-acc-1
4.0 space-xxxxx-acc-1 15 space-xxxxx-foldl
I've done the measurements on a 2GHz Athlon64 3000+, a 1667 MHz Core Duo, and a 600MHz Pentium III. They all show the same pattern (with a few minor aberrations for the PIII). You did got one of the relative speeds right ;) (and some of the memory usages) I've tested with ghc 6.8.1 and 6.9.20071119 and 6.9.20071208 (or thereabouts). 6.6.1 won't run my benchmarks and it also won't let me install bytestring-0.9.0.1 to replace its built-in version. -Peter