Re: [Haskell-beginners] Speed performance problem on Windows?

6 Mar 2010

      For the record, I'm adding my numbers to the pool:

Calling "bigmean1.hs" to the first piece of code (the recursive version)
and "bigmean2.hs" to the second (the one using 'foldU'), I compiled four
versions of the two and timed them while they computed the mean of
[1..1e9]. Here are the results:

MY SYSTEM (512 RAM, Mobile AMD Sempron(tm) 3400+ proc [1 core]) (you're
run-o-the-mill Ubuntu laptop):
~$ uname -a
Linux dy-book 2.6.31-19-generic #56-Ubuntu SMP Thu Jan 28 01:26:53 UTC
2010 i686 GNU/Linux
~$ ghc -V
The Glorious Glasgow Haskell Compilation System, version 6.12.1

RUN 1 - C generator, without excess-precision

~$ ghc -o bigmean1 --make -fforce-recomp -O2 -fvia-C -optc-O3
bigmean1.hs 
~$ ghc -o bigmean2 --make -fforce-recomp -O2 -fvia-C -optc-O3
bigmean2.hs 

~$ time ./bigmean1 1e9
500000000.067109

real 0m47.685s	user 0m47.655s	sys 0m0.000s

~$ time ./bigmean2 1e9
500000000.067109

real 1m4.696s	user 1m4.324s	sys 0m0.028s

RUN 2 - default generator, no excess-precision

~$ ghc --make -O2 -fforce-recomp -o bigmean2-noC bigmean2.hs
~$ ghc --make -O2 -fforce-recomp -o bigmean1-noC bigmean1.hs

~$ time ./bigmean1-noC 1e9
500000000.067109

real 0m16.571s	user 0m16.493s	sys 0m0.012s

~$ time ./bigmean2-noC 1e9
500000000.067109

real 0m27.146s	user 0m27.086s	sys 0m0.004s

RUN 3 - C generator, with excess-precision.

~$ ghc --make -fforce-recomp -O2 -fvia-C -optc-O3 -fexcess-precision -o
bigmean1-precis bigmean1.hs 
~$ ghc --make -fforce-recomp -O2 -fvia-C -optc-O3 -fexcess-precision -o
bigmean2-precis bigmean2.hs 

~$ time ./bigmean1-precis 1e9
500000000.067109

real 0m11.937s	user 0m11.841s	sys 0m0.012s

~$ 
time ./bigmean2-precis 1e9
500000000.067109

real 0m17.105s	user 0m17.081s	sys 0m0.004s

RUN 4 - default generator, with excess-precision

~$ ghc --make -fforce-recomp -O2 -fexcess-precision -o bigmean1-precis
bigmean1.hs
~$ ghc --make -fforce-recomp -O2 -fexcess-precision -o bigmean2-precis
bigmean2.hs

~$ time ./bigmean1-precis 1e9
500000000.067109

real 0m16.521s	user 0m16.413s	sys 0m0.008s

~$ time ./bigmean2-precis 1e9

500000000.067109

real 0m27.381s	user 0m27.190s	sys 0m0.016s

CONCLUSIONS:
· Big difference between the two versions (recursive and
fusion-oriented). I check compiling with -ddump-simple-stats, and the
rule mention in Don's article IS being fired (streamU/unstraemU) once.
The recursive expression of the algorithm is quite faster
· Big gain adding the excess-precision flag to the compiling step, even
if not using the C code generator.
· The best time is achieved compiling through the C generator, with
excess-precis flag; second best (5 seconds away in execution) is adding
the same flag to the default generator.

I didn't know of the -fexcess-precision. It really makes a BIG
difference to number cruncher modules :D

El sáb, 06-03-2010 a las 01:36 +0100, Daniel Fischer escribió:
...
Am Samstag 06 März 2010 00:20:52 schrieb Travis Erdman:
...
I'm working through one of Don Stewart's many excellent articles ...
http://cgi.cse.unsw.edu.au/~dons/blog/2008/06/04#fast-fusion
I faithfully re-created the source of his initial GHC reference
implementation as:
<snip>
Then, compiled and executed like this:
C:\Documents and Settings\Travis\My Documents\Haskell Code>ghc -O2
biglistmean.hs -optc-O2 -fvia-C --make -fforce-recomp [1 of 1] Compiling
Main             ( biglistmean.hs, biglistmean.o ) Linking
biglistmean.exe ...
Not the best combination of options, for me at least. On my box, that is 
approximately 35% slower than -O2 with the native code generator.
...
On the final test of 10^9, Don reports that it took 1.76 secs on his
machine.
Well, Don has a super fast 64-bit thingy, on normal machines, all code runs 
much slower than on Don's :)
...
In contrast, just 10^8 takes 12.63 secs on my machine
But not that much slower, ouch.
On my machine, 10^8 takes
~3.8s compiled with -O2 -fvia-C -optc-O2 [or -optc-O3, doesn't make a 
difference]
~2.8s compiled with -O2 [with and without -fexcess-precision]
~1.18s compiled with -O2 -fexcess-precision -fvia-C -optc-O3
Floating point arithmetic compiled via C profits greatly from -fexcess-
precision (well, at least on my system, YMMV).
Alas, equivalent gcc-compiled C code takes only 0.35s for 10^8 (0.36 with 
icc).
Multiply all timings by 10 for 10^9.
...
(sophisticatedly timed with handheld stopwatch) and on the coup de grace
10^9 test, it takes 2min:04secs.  Yikes!  My hardware is a little old
(Win XP on Pentium 4 3.06GHz w 2 GB RAM) but not THAT old.  I'm using
the latest Haskell Platform which includes ghc v 6.10.4.
I also have 3.06GHz P4 (2 cores, 1 GB RAM), running openSuSE 11.1 and 
ghc-6.12.1, ghc-6.10.3 (no difference between 6.10 and 6.12 for this loop).
The P4 isn't particularly fast, unfortunately.
...
Primary question:  What gives here?
GCC on XP sucks. Big time, AFAIK. Compile your stuff once via C and once 
with the native code generator and compare. I think you'll almost always 
find the NCG faster, sometimes very much.
...
Incidental questions:  Is there a nice way to time executed code in
Windows ala the "time" command Don shows under Linux?
There's timeit.exe, as linked to in 
http://channel9.msdn.com/forums/Coffeehouse/258979-Windows-equivalent-of-
UnixLinux-time-command/
...
Also, does the
ordering of the compiler flags have any impact (I hope not, but I don't
want to be surprised ...)
Depends. If you give conflicting options, the last takes precedence (unless 
some combination gives an error, don't know if that happens). If the 
options aren't conflicting, the order doesn't matter.
...
Thanks,
Travis Erdman
_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners

Re: [Haskell-beginners] Speed performance problem on Windows?

MAN