Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser

15 Dec 2007

      On Sat, 2007-12-15 at 14:34 +0000, Duncan Coutts wrote:
...
Ok, I presume this is a guessing game and we're supposed to just look at
the code without running and timing them.
Precisely :)
...
...
All they do is read from stdin and count the number of spaces they see.
There are two that use strict bytestrings, two that use lazy
bytestrings, and two that use the standard Haskell strings.  Three use a
recursive function with an accumulator parameter and three use a foldl
with a lambda function.
Say the fastest one takes the time 1.  How much time will the others
take?
And how about memory?  How much memory do you think they require?  Let's
say we feed a 150MB(*) file into each of them, how many megabytes do you
think they end up using (as seen from the OS, not in terms of how big
the live heap is)?
I'm going to post full benchmarks + analysis on Wednesday.
Right'o. I'll have a go. Lets see if I can't embarrass myself with being
completely inaccurate.
Thanks for biting!

You were, thankfully, only almost completely inaccurate ;)
...
...
PS: For extra credit, what do you think is the peak memory use for this
    program when given an input file of 150MB?
...
Hmm. So that should work in constant memory, a few 64 chunks at once.
I'd expect this to be pretty fast.
You are right about the speed.
Can you guess a number in kilobytes?
...
...
==============================
hs/space-bs-c8-acc-1.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Char8 as B
cnt	:: Int -> B.ByteString -> Int
cnt !acc bs = if B.null bs
      then acc
          else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs)
main = do s <- B.getContents
    print (cnt 0 s)
This uses strict bytestrings so will use at least 150Mb and that'll make
it a good deal slower. In fact it'll be worse than that since
getContents does not know in advance how big the input will be so it has
to play the doubling and copying game. So it'll end up copying all the
data roughly twice. cnt is strict and tail recursive so that shouldn't
be any problem, though it's probably not as fast as the first length .
filter since head, tail, null all have to do bounds checks.
You are right about the memory.  It is actually slightly faster than the
"extra credit" (length/filter combination) above.
...
...
==============================
hs/space-bslc8-acc-1.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
cnt	:: Int -> B.ByteString -> Int
cnt !acc bs = if B.null bs
      then acc
          else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs)
main = do s <- B.getContents
    print (cnt 0 s)
For the same reason as above, I'd expect this cnt to be slower than
B.length . B.filter (== ' ')
It is slower but not for the same reason as above.
...
...
==============================
hs/space-xxxxx-acc-1.hs:
{-# LANGUAGE BangPatterns #-}
cnt	:: Int -> String -> Int
cnt !acc bs = if null bs
      then acc
          else cnt (if head bs == ' ' then acc+1 else acc) (tail bs)
main = do s <- getContents
    print (cnt 0 s)
Lazy, so constant memory use, but much higher constant factors due to
using String.
Spot on.
...
...
==============================
hs/space-bs-c8-foldlx-1.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Char8 as B
cnt	:: B.ByteString -> Int
cnt bs	= B.foldl' (\sum c -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- B.getContents
    print (cnt s)
This is of course still strict so that's going to make the reading slow.
Nope.
...
This is a manually fused B.length . B.filter (== ' ') which hopefully is
the same speed as the automatically fused one if the fusion is working
ok. If not, then the B.length . B.filter (== ' ') will be doing a extra
copy, and memory writes are expensive.
...
==============================
hs/space-bslc8-foldlx-1.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
cnt	:: B.ByteString -> Int
cnt bs	= B.foldl' (\sum c -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- B.getContents
    print (cnt s)
As above but now in constant memory space.
Nope.
...
...
==============================
hs/space-xxxxx-foldl.hs:
{-# LANGUAGE BangPatterns #-}
cnt	:: String -> Int
cnt bs	= foldl (\sum c -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- getContents
    print (cnt s)
Oh, no! not foldl that's a killer.
You think it's worse than the program just above?
...
Ok, so best way to summarise I think is to organise by data type since I
think that'll dominate.
So I think the lazy bytestring versions will be fastest due to having
the best memory access patterns and doing the least copying. I think the
foldl's will be faster than the explicit accumulators due to having
fewer bounds checks.
space-bslc8-foldlx-1
space-bslc8-acc-1
space-bs-c8-foldlx-1
space-bs-c8-acc-1
space-xxxxx-acc-1
space-xxxxx-foldl
I'll try guessing at some ratios:
1.0 space-bslc8-foldlx-1
1.1 space-bslc8-acc-1
2.0 space-bs-c8-foldlx-1
2.1 space-bs-c8-acc-1
4.0 space-xxxxx-acc-1
15  space-xxxxx-foldl
I've done the measurements on a 2GHz Athlon64 3000+, a 1667 MHz Core
Duo, and a 600MHz Pentium III.  They all show the same pattern (with a
few minor aberrations for the PIII).

You did got one of the relative speeds right ;)
(and some of the memory usages)

I've tested with ghc 6.8.1 and 6.9.20071119 and 6.9.20071208 (or
thereabouts).  6.6.1 won't run my benchmarks and it also won't let me
install bytestring-0.9.0.1 to replace its built-in version.

-Peter

Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser

Peter Lund