Re: [Haskell-cafe] How to benckmark a Word64 -> Bool?

Thanks Erik!
I was kind of hoping for such a benchmark. On my system, the C version is
consistently twice as fast as the Haskell version. (I tried -fllvm, but it
doesn't change perf)
Then I added bang patterns for the argument of only012 and in the tuple
(p,q) and it now has the same runtime as C.
Just for fun, I tried to void writing explicit recursion and used only
combinators and this Haskell code is still as competitive as C, so the real
bottleneck really was the laziness in the tuple/argument.
http://lpaste.net/100153
In fact, this Haskell version takes 8 sec vs 10 sec for the C version on my
laptop (input: 1000000000).
I tried using Data.List.Stream, I couldn't get any performance gain,
probably because the code is very close to optimal.
So remember kids:
- Bang pattern the heck out of your code (not a silver bullet, but
definitely worth trying).
- Don't rely on too much IO for benchmark.
- Don't be afraid of combinators even if you want performance.
- Ask on haskell-cafe, chaps here are very helpful :)
--
Cp
On Wed, Feb 19, 2014 at 8:39 PM, Erik Rantapaa
I think a lot is being obscured by the IO subsystem.
If you print just a count of the only012 numbers I get very different results.
Here are the versions I compared:
and on my system the C version is about 10x faster.
On Wednesday, February 19, 2014 12:49:36 PM UTC-6, Charles-Pierre Astolfi wrote:
Switching to quotRem gave no measurable improvements. After switching to ByteString, the code now runs in 9 seconds, which outperforms my C version. But honestly, I have no idea why.
New code: http://lpaste.net/100136
$ ghc --make -O3 303only012.hs && time ./303only012 50000000 > /dev/null ./303only012 50000000 > /dev/null 9.72s user 0.21s system 90% cpu 10.961 total
@Alois, I'm not sure how criterion can help compare my code with the C version, since in the C version I cannot measure the exec time of only012 only. What did you have in mind?
Thanks everyone!
-- Cp
On Wed, Feb 19, 2014 at 7:24 PM, Levent Erkok
wrote: Also, prefer quotRem over divMod as the former is faster. See here: http://stackoverflow.com/questions/339719/when-is- the-difference-between-quotrem-and-divmod-useful
On Wed, Feb 19, 2014 at 10:19 AM, Gregory Collins < gr...@gregorycollins.net> wrote:
On Wed, Feb 19, 2014 at 9:36 AM, Charles-Pierre Astolfi
wrote:
So there's a difference, but I'm not sure if it's related to my algorithm or related to IO/RTS.
Text.Printf is slower than the C version (and uses String). Use Data.ByteString.putStr instead.
-- Gregory Collins
_______________________________________________ Haskell-Cafe mailing list Haskel...@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

This version is about 2x as fast as OP's on my system (look at h2.hs):
https://gist.github.com/gregorycollins/9103248
$ time ./dist/build/h1/h1 50000000 +RTS -A4M > /dev/null
./dist/build/h1/h1 50000000 +RTS -A4M > /dev/null 23.88s user 0.06s system
99% cpu 24.003 total
$ time ./dist/build/h2/h2 50000000 +RTS -A4M > /dev/null
./dist/build/h2/h2 50000000 +RTS -A4M > /dev/null 13.57s user 0.09s system
99% cpu 13.706 total
$ time ./dist/build/h3/h3 50000000 +RTS -A4M > /dev/null
./dist/build/h3/h3 50000000 +RTS -A4M > /dev/null 24.24s user 0.18s system
99% cpu 24.490 total
The "system" number here is especially telling: this program is spending
almost all of its time in syscalls.
G
On Wed, Feb 19, 2014 at 12:47 PM, Charles-Pierre Astolfi
Thanks Erik!
I was kind of hoping for such a benchmark. On my system, the C version is consistently twice as fast as the Haskell version. (I tried -fllvm, but it doesn't change perf)
Then I added bang patterns for the argument of only012 and in the tuple (p,q) and it now has the same runtime as C.
Just for fun, I tried to void writing explicit recursion and used only combinators and this Haskell code is still as competitive as C, so the real bottleneck really was the laziness in the tuple/argument.
In fact, this Haskell version takes 8 sec vs 10 sec for the C version on my laptop (input: 1000000000).
I tried using Data.List.Stream, I couldn't get any performance gain, probably because the code is very close to optimal.
So remember kids: - Bang pattern the heck out of your code (not a silver bullet, but definitely worth trying). - Don't rely on too much IO for benchmark. - Don't be afraid of combinators even if you want performance. - Ask on haskell-cafe, chaps here are very helpful :)
-- Cp
On Wed, Feb 19, 2014 at 8:39 PM, Erik Rantapaa
wrote: I think a lot is being obscured by the IO subsystem.
If you print just a count of the only012 numbers I get very different results.
Here are the versions I compared:
and on my system the C version is about 10x faster.
On Wednesday, February 19, 2014 12:49:36 PM UTC-6, Charles-Pierre Astolfi wrote:
Switching to quotRem gave no measurable improvements. After switching to ByteString, the code now runs in 9 seconds, which outperforms my C version. But honestly, I have no idea why.
New code: http://lpaste.net/100136
$ ghc --make -O3 303only012.hs && time ./303only012 50000000 > /dev/null ./303only012 50000000 > /dev/null 9.72s user 0.21s system 90% cpu 10.961 total
@Alois, I'm not sure how criterion can help compare my code with the C version, since in the C version I cannot measure the exec time of only012 only. What did you have in mind?
Thanks everyone!
-- Cp
On Wed, Feb 19, 2014 at 7:24 PM, Levent Erkok
wrote: Also, prefer quotRem over divMod as the former is faster. See here: http://stackoverflow.com/questions/339719/when-is- the-difference-between-quotrem-and-divmod-useful
On Wed, Feb 19, 2014 at 10:19 AM, Gregory Collins < gr...@gregorycollins.net> wrote:
On Wed, Feb 19, 2014 at 9:36 AM, Charles-Pierre Astolfi < c...@crans.org> wrote:
So there's a difference, but I'm not sure if it's related to my algorithm or related to IO/RTS.
Text.Printf is slower than the C version (and uses String). Use Data.ByteString.putStr instead.
-- Gregory Collins
_______________________________________________ Haskell-Cafe mailing list Haskel...@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
--
Gregory Collins

On Wed, Feb 19, 2014 at 5:53 PM, Gregory Collins
This version is about 2x as fast as OP's on my system (look at h2.hs): https://gist.github.com/gregorycollins/9103248
$ time ./dist/build/h1/h1 50000000 +RTS -A4M > /dev/null ./dist/build/h1/h1 50000000 +RTS -A4M > /dev/null 23.88s user 0.06s system 99% cpu 24.003 total
$ time ./dist/build/h2/h2 50000000 +RTS -A4M > /dev/null ./dist/build/h2/h2 50000000 +RTS -A4M > /dev/null 13.57s user 0.09s system 99% cpu 13.706 total
$ time ./dist/build/h3/h3 50000000 +RTS -A4M > /dev/null ./dist/build/h3/h3 50000000 +RTS -A4M > /dev/null 24.24s user 0.18s system 99% cpu 24.490 total
The "system" number here is especially telling: this program is spending almost all of its time in syscalls.
You're reading it wrong. Using that last one: 24.2s on-CPU time in user mode 0.18s on-CPU time in system mode (kernel/syscalls) 99% of its total run time was actually spent on cpu (instead of, say, I/O wait) 24.490 total run (wall) time -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
participants (3)
-
Brandon Allbery
-
Charles-Pierre Astolfi
-
Gregory Collins