Thanks for such specific requests, and great idea to focus on imaginary!
Outline: SUMMARY, HARDWARE, NUMBERS
* I still get quite different numbers.
* For wheel-sieve1 and kahan, we're in the same ballpark.
* Though my kahan shows half as much allocation increase. Which hardware difference would explain this?
* For bernoiulli, exp3_8, and integrate, my nofib-analyse shows percent changes in Runtime, where as yours shows absolutes. I've included my absolute tables below; comparing those, we still get appreciable differences.
* I've included my mode=slow numbers.
* We have some significant hardware differences.
* My machine claims 32 processors, though it smells like it has 8 chips with 8 cores each. I'll ask SPJ or Mainland.
* My cache size is much smaller than yours: 512 KB versus 8 MB.
* My CPU frequency is 2GHz compared to your 3.4GHz.
* How do we want to handle hardware diversity like we're seeing in these regular benchmark runs?
* Are the different behaviors we're seeing expected for our hardware differences or bugs of some sort?
Thanks, Johan.
== THE HARDWARE ==
$ cat /proc/cpuinfo
processor : 0 # counts up to 31, with physical id and core id pairs duplicated once
vendor_id : AuthenticAMD
cpu family : 16
model : 9
model name : AMD Opteron(tm) Processor 6128
stepping : 1
microcode : 0x10000d4
cpu MHz : 1999.949
cache size : 512 KB
physical id : 0 # counts up to 3 for each core id, twice
siblings : 8
core id : 0 # counts up to 3, for each physical id, twice
cpu cores : 8
apicid : 0 # varies
initial apicid : 0 # varies
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save pausefilter
bogomips : 3999.89
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
physical id and core id exhaust combinations of {0,1,2,3}, twice for some reason as processor counts from 0 to 31. I would have suspected 64 processors, given the sibling and cpu cores. Am I tripping on a common misconception?
I included the rest of the info because I still get different numbers than you do.
$ uname -a
Linux cam-05-unx 3.2.0-35-generic #55-Ubuntu SMP Wed Dec 5 17:42:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
$ VERSION=7.0.4; HC=/home/t-nicof/installs/ghc-${VERSION}/bin/ghc-${VERSION}; $HC --info
[("Project name","The Glorious Glasgow Haskell Compilation System")
,("Project version","7.0.4")
,("Booter version","6.12.1")
,("Stage","2")
,("Build platform","x86_64-unknown-linux")
,("Host platform","x86_64-unknown-linux")
,("Target platform","x86_64-unknown-linux")
,("Have interpreter","YES")
,("Object splitting","YES")
,("Have native code generator","YES")
,("Have llvm code generator","YES")
,("Support SMP","YES")
,("Unregisterised","NO")
,("Tables next to code","YES")
,("RTS ways","l debug thr thr_debug thr_l thr_p dyn debug_dyn thr_dyn thr_debug_dyn")
,("Leading underscore","NO")
,("Debug on","False")
,("LibDir","/home/t-nicof/installs/ghc-7.0.4/lib/ghc-7.0.4")
,("Global Package DB","/home/t-nicof/installs/ghc-7.0.4/lib/ghc-7.0.4/package.conf.d")
,("C compiler flags","[\"-fno-stack-protector\"]")
,("Gcc Linker flags","[]")
,("Ld Linker flags","[]")
$ VERSION=7.6.2; HC=/home/t-nicof/installs/ghc-${VERSION}/bin/ghc-${VERSION}; $HC --info
[("Project name","The Glorious Glasgow Haskell Compilation System")
,("GCC extra via C opts"," -fwrapv")
,("C compiler command","/usr/bin/gcc")
,("C compiler flags"," -fno-stack-protector ")
,("ar command","/usr/bin/ar")
,("ar flags","q")
,("ar supports at file","@ArSupportsAtFile@")
,("touch command","touch")
,("dllwrap command","/bin/false")
,("windres command","/bin/false")
,("perl command","/usr/bin/perl")
,("target os","OSLinux")
,("target arch","ArchX86_64")
,("target word size","8")
,("target has GNU nonexec stack","True")
,("target has .ident directive","True")
,("target has subsections via symbols","False")
,("LLVM llc command","llc")
,("LLVM opt command","opt")
,("Project version","7.6.2")
,("Booter version","7.4.1")
,("Stage","2")
,("Build platform","x86_64-unknown-linux")
,("Host platform","x86_64-unknown-linux")
,("Target platform","x86_64-unknown-linux")
,("Have interpreter","YES")
,("Object splitting supported","YES")
,("Have native code generator","YES")
,("Support SMP","YES")
,("Unregisterised","NO")
,("Tables next to code","YES")
,("RTS ways","l debug thr thr_debug thr_l thr_p dyn debug_dyn thr_dyn thr_debug_dyn")
,("Leading underscore","NO")
,("Debug on","False")
,("LibDir","/home/t-nicof/installs/ghc-7.6/lib/ghc-7.6.2")
,("Global Package DB","/home/t-nicof/installs/ghc-7.6/lib/ghc-7.6.2/package.conf.d")
,("Gcc Linker flags","[\"-Wl,--hash-size=31\",\"-Wl,--reduce-memory-overheads\"]")
,("Ld Linker flags","[\"--hash-size=31\",\"--reduce-memory-overheads\"]")
]
== THE NUMBERS ==
With VERSION=7.0.4 or VERSION=7.6.2. (I'm not relying on $PATH, is the only difference.)
$ HC=/home/t-nicof/installs/ghc-${VERSION}/bin/ghc-${VERSION}; (make clean && make boot WithNofibHc=${HC} && make WithNofibHc=${HC}) >& log-${VERSION}
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
bernouilli +3.3% +0.2% +7.2% +7.4% +0.0%
exp3_8 +1.1% +53.7% +55.4% +57.5% +300.0%
gen_regexps +18.6% +3.9% 0.00 0.00 +0.0%
integrate -0.1% +39.0% +110.2% +88.5% +0.0%
kahan +1.7% +41.8% +8.2% +8.0% +0.0%
paraffins +1.3% -1.2% -3.6% -0.8% +0.0%
primes +1.4% +64.7% 0.11 0.11 +50.0%
queens +0.8% -0.2% 0.02 0.02 +0.0%
rfib +1.7% +42.8% 0.03 0.03 +0.0%
tak +0.9% +12.0% 0.02 0.02 +0.0%
wheel-sieve1 +1.4% +66.6% -4.0% -4.3% -17.6%
wheel-sieve2 +1.4% +0.0% -0.2% -2.1% +0.0%
x2n1 +10.3% +41.7% 0.01 0.01 +200.0%
--------------------------------------------------------------------------------
Min -0.1% -1.2% -4.0% -4.3% -17.6%
Max +18.6% +66.6% +110.2% +88.5% +300.0%
Geometric Mean +3.3% +25.6% +19.6% +18.1% +23.0%
I did it twice.
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
bernouilli +3.3% +0.2% +7.0% +7.1% +0.0%
exp3_8 +1.1% +53.7% +56.6% +57.8% +300.0%
gen_regexps +18.6% +3.9% 0.00 0.00 +0.0%
integrate -0.1% +39.0% +102.1% +86.2% +0.0%
kahan +1.7% +41.8% +9.5% +8.9% +0.0%
paraffins +1.3% -1.2% -0.6% -4.8% +0.0%
primes +1.4% +64.7% 0.11 0.11 +50.0%
queens +0.8% -0.2% 0.02 0.02 +0.0%
rfib +1.7% +42.8% 0.03 0.03 +0.0%
tak +0.9% +12.0% 0.02 0.02 +0.0%
wheel-sieve1 +1.4% +66.6% -4.4% -4.3% -17.6%
wheel-sieve2 +1.4% +0.0% -1.1% -2.8% +0.0%
x2n1 +10.3% +41.7% 0.01 0.01 +200.0%
--------------------------------------------------------------------------------
Min -0.1% -1.2% -4.4% -4.8% -17.6%
Max +18.6% +66.6% +102.1% +86.2% +300.0%
Geometric Mean +3.3% +25.6% +19.5% +17.2% +23.0%
Maybe your machine is too fast for nofib-analyse to include exp3_8.
Allocations
-------------------------------------------------------------------------------
Program log-7.0.4 log-7.6.2
-------------------------------------------------------------------------------
bernouilli 303890616 +0.2%
exp3_8 389023528 +53.7%
gen_regexps 304768 +3.9%
integrate 546206856 +39.0%
kahan 700842656 +41.8%
paraffins 56201680 -1.2%
primes 65899520 +64.7%
queens 17387888 -0.2%
rfib 81176 +42.8%
tak 94408 +12.0%
wheel-sieve1 14620056 +66.6%
wheel-sieve2 88734064 +0.0%
x2n1 2491928 +41.7%
-1 s.d. ----- +3.0%
+1 s.d. ----- +53.2%
Average ----- +25.6%
Run Time
-------------------------------------------------------------------------------
Program log-7.0.4 log-7.6.2
-------------------------------------------------------------------------------
bernouilli 0.28 +7.2%
exp3_8 0.21 +55.4%
gen_regexps 0.00 0.00
integrate 0.34 +110.2%
kahan 1.07 +8.2%
paraffins 0.22 -3.6%
primes 0.10 0.11
queens 0.02 0.02
rfib 0.03 0.03
tak 0.01 0.02
wheel-sieve1 0.68 -4.0%
wheel-sieve2 0.37 -0.2%
x2n1 0.00 0.01
-1 s.d. ----- -9.3%
+1 s.d. ----- +57.7%
Average ----- +19.6%
And here are the results using the "mode=slow" Nofib option. Only bernouilli and gen_regexps do not have SLOW_OPTS defined in their Makefile. It's odd that gen_regexps shows such drastic change then...
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
bernouilli +3.3% +0.2% +6.7% +7.4% +0.0%
exp3_8 +1.1% +68.2% +20.2% +20.3% +100.0%
gen_regexps +18.6% -1.2% -6.3% -6.0% +0.0%
integrate -0.1% +39.0% +114.3% +104.9% +3.9%
kahan +1.7% +41.9% +7.5% +7.5% +0.0%
paraffins +1.3% -1.2% -3.3% -2.7% +0.3%
primes +1.4% +57.9% +0.8% +1.0% +0.0%
queens +0.8% -0.2% -1.7% -2.0% +0.0%
rfib +1.7% +42.8% +6.3% +6.0% +0.0%
tak +0.9% +12.0% -2.4% -2.5% +0.0%
wheel-sieve1 +1.4% +99.2% -3.4% -3.4% +58.8%
wheel-sieve2 +1.4% -0.1% -3.6% -3.8% +0.0%
x2n1 +10.3% +43.1% 0.13 0.13 +1300.0%
--------------------------------------------------------------------------------
Min -0.1% -1.2% -6.3% -6.0% +0.0%
Max +18.6% +99.2% +114.3% +104.9% +1300.0%
Geometric Mean +3.3% +27.4% +8.2% +7.8% +34.3%
Allocations
-------------------------------------------------------------------------------
Program log-slow-7.0.4 log-slow-7.6.2
-------------------------------------------------------------------------------
bernouilli 303890616 +0.2%
exp3_8 3500234960 +68.2%
gen_regexps 780759064 -1.2%
integrate 1092338624 +39.0%
kahan 2797648296 +41.9%
paraffins 363166288 -1.2%
primes 861820872 +57.9%
queens 569243336 -0.2%
rfib 81488 +42.8%
tak 94408 +12.0%
wheel-sieve1 24134568 +99.2%
wheel-sieve2 160800936 -0.1%
x2n1 19375720 +43.1%
-1 s.d. ----- +1.2%
+1 s.d. ----- +60.4%
Average ----- +27.4%
Run Time
-------------------------------------------------------------------------------
Program log-slow-7.0.4 log-slow-7.6.2
-------------------------------------------------------------------------------
bernouilli 0.28 +6.7%
exp3_8 3.39 +20.2%
gen_regexps 1.90 -6.3%
integrate 0.68 +114.3%
kahan 4.35 +7.5%
paraffins 1.93 -3.3%
primes 1.51 +0.8%
queens 0.72 -1.7%
rfib 0.31 +6.3%
tak 1.58 -2.4%
wheel-sieve1 2.28 -3.4%
wheel-sieve2 0.77 -3.6%
x2n1 0.04 0.13
-1 s.d. ----- -12.9%
+1 s.d. ----- +34.3%
Average ----- +8.2%