Thanks for such specific requests, and great idea to focus on imaginary!
 
Outline: SUMMARY, HARDWARE, NUMBERS
 
== THE SUMMARY ==
 
  * I still get quite different numbers.
    * For wheel-sieve1 and kahan, we're in the same ballpark.
    * Though my kahan shows half as much allocation increase. Which hardware difference would explain this?
    * For bernoiulli, exp3_8, and integrate, my nofib-analyse shows percent changes in Runtime, where as yours shows absolutes. I've included my absolute tables below; comparing those, we still get appreciable differences.
 
  * I've included my mode=slow numbers.
 
  * We have some significant hardware differences.
    * My machine claims 32 processors, though it smells like it has 8 chips with 8 cores each. I'll ask SPJ or Mainland.
    * My cache size is much smaller than yours: 512 KB versus 8 MB.
    * My CPU frequency is 2GHz compared to your 3.4GHz.
 
  * How do we want to handle hardware diversity like we're seeing in these regular benchmark runs?
    * Are the different behaviors we're seeing expected for our hardware differences or bugs of some sort?
 
Thanks, Johan.
 
== THE HARDWARE ==
 
$ cat /proc/cpuinfo
processor : 0 # counts up to 31, with physical id and core id pairs duplicated once
vendor_id : AuthenticAMD
cpu family : 16
model  : 9
model name : AMD Opteron(tm) Processor 6128
stepping : 1
microcode : 0x10000d4
cpu MHz  : 1999.949
cache size : 512 KB
physical id : 0 # counts up to 3 for each core id, twice
siblings : 8
core id  : 0 # counts up to 3, for each physical id, twice
cpu cores : 8
apicid  : 0 # varies
initial apicid : 0 # varies
fpu  : yes
fpu_exception : yes
cpuid level : 5
wp  : yes
flags  : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save pausefilter
bogomips : 3999.89
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate 
 
physical id and core id exhaust combinations of {0,1,2,3}, twice for some reason as processor counts from 0 to 31. I would have suspected 64 processors, given the sibling and cpu cores. Am I tripping on a common misconception?
 
I included the rest of the info because I still get different numbers than you do.
 
$ uname -a
Linux cam-05-unx 3.2.0-35-generic #55-Ubuntu SMP Wed Dec 5 17:42:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 
$ VERSION=7.0.4; HC=/home/t-nicof/installs/ghc-${VERSION}/bin/ghc-${VERSION}; $HC --info
 [("Project name","The Glorious Glasgow Haskell Compilation System")
 ,("Project version","7.0.4")
 ,("Booter version","6.12.1")
 ,("Stage","2")
 ,("Build platform","x86_64-unknown-linux")
 ,("Host platform","x86_64-unknown-linux")
 ,("Target platform","x86_64-unknown-linux")
 ,("Have interpreter","YES")
 ,("Object splitting","YES")
 ,("Have native code generator","YES")
 ,("Have llvm code generator","YES")
 ,("Support SMP","YES")
 ,("Unregisterised","NO")
 ,("Tables next to code","YES")
 ,("RTS ways","l debug  thr thr_debug thr_l thr_p  dyn debug_dyn thr_dyn thr_debug_dyn")
 ,("Leading underscore","NO")
 ,("Debug on","False")
 ,("LibDir","/home/t-nicof/installs/ghc-7.0.4/lib/ghc-7.0.4")
 ,("Global Package DB","/home/t-nicof/installs/ghc-7.0.4/lib/ghc-7.0.4/package.conf.d")
 ,("C compiler flags","[\"-fno-stack-protector\"]")
 ,("Gcc Linker flags","[]")
 ,("Ld Linker flags","[]")
 
$ VERSION=7.6.2; HC=/home/t-nicof/installs/ghc-${VERSION}/bin/ghc-${VERSION}; $HC --info
 [("Project name","The Glorious Glasgow Haskell Compilation System")
 ,("GCC extra via C opts"," -fwrapv")
 ,("C compiler command","/usr/bin/gcc")
 ,("C compiler flags"," -fno-stack-protector ")
 ,("ar command","/usr/bin/ar")
 ,("ar flags","q")
 ,("ar supports at file","@ArSupportsAtFile@")
 ,("touch command","touch")
 ,("dllwrap command","/bin/false")
 ,("windres command","/bin/false")
 ,("perl command","/usr/bin/perl")
 ,("target os","OSLinux")
 ,("target arch","ArchX86_64")
 ,("target word size","8")
 ,("target has GNU nonexec stack","True")
 ,("target has .ident directive","True")
 ,("target has subsections via symbols","False")
 ,("LLVM llc command","llc")
 ,("LLVM opt command","opt")
 ,("Project version","7.6.2")
 ,("Booter version","7.4.1")
 ,("Stage","2")
 ,("Build platform","x86_64-unknown-linux")
 ,("Host platform","x86_64-unknown-linux")
 ,("Target platform","x86_64-unknown-linux")
 ,("Have interpreter","YES")
 ,("Object splitting supported","YES")
 ,("Have native code generator","YES")
 ,("Support SMP","YES")
 ,("Unregisterised","NO")
 ,("Tables next to code","YES")
 ,("RTS ways","l debug  thr thr_debug thr_l thr_p dyn debug_dyn thr_dyn thr_debug_dyn")
 ,("Leading underscore","NO")
 ,("Debug on","False")
 ,("LibDir","/home/t-nicof/installs/ghc-7.6/lib/ghc-7.6.2")
 ,("Global Package DB","/home/t-nicof/installs/ghc-7.6/lib/ghc-7.6.2/package.conf.d")
 ,("Gcc Linker flags","[\"-Wl,--hash-size=31\",\"-Wl,--reduce-memory-overheads\"]")
 ,("Ld Linker flags","[\"--hash-size=31\",\"--reduce-memory-overheads\"]")
 ]
 
== THE NUMBERS ==
 
With VERSION=7.0.4 or VERSION=7.6.2. (I'm not relying on $PATH, is the only difference.)
 
$ HC=/home/t-nicof/installs/ghc-${VERSION}/bin/ghc-${VERSION}; (make clean && make boot WithNofibHc=${HC} && make WithNofibHc=${HC}) >& log-${VERSION}
 
--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
     bernouilli          +3.3%     +0.2%     +7.2%     +7.4%     +0.0%
         exp3_8          +1.1%    +53.7%    +55.4%    +57.5%   +300.0%
    gen_regexps         +18.6%     +3.9%      0.00      0.00     +0.0%
      integrate          -0.1%    +39.0%   +110.2%    +88.5%     +0.0%
          kahan          +1.7%    +41.8%     +8.2%     +8.0%     +0.0%
      paraffins          +1.3%     -1.2%     -3.6%     -0.8%     +0.0%
         primes          +1.4%    +64.7%      0.11      0.11    +50.0%
         queens          +0.8%     -0.2%      0.02      0.02     +0.0%
           rfib          +1.7%    +42.8%      0.03      0.03     +0.0%
            tak          +0.9%    +12.0%      0.02      0.02     +0.0%
   wheel-sieve1          +1.4%    +66.6%     -4.0%     -4.3%    -17.6%
   wheel-sieve2          +1.4%     +0.0%     -0.2%     -2.1%     +0.0%
           x2n1         +10.3%    +41.7%      0.01      0.01   +200.0%
--------------------------------------------------------------------------------
            Min          -0.1%     -1.2%     -4.0%     -4.3%    -17.6%
            Max         +18.6%    +66.6%   +110.2%    +88.5%   +300.0%
 
Geometric Mean          +3.3%    +25.6%    +19.6%    +18.1%    +23.0%
 
I did it twice.
 
--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
     bernouilli          +3.3%     +0.2%     +7.0%     +7.1%     +0.0%
         exp3_8          +1.1%    +53.7%    +56.6%    +57.8%   +300.0%
    gen_regexps         +18.6%     +3.9%      0.00      0.00     +0.0%
      integrate          -0.1%    +39.0%   +102.1%    +86.2%     +0.0%
          kahan          +1.7%    +41.8%     +9.5%     +8.9%     +0.0%
      paraffins          +1.3%     -1.2%     -0.6%     -4.8%     +0.0%
         primes          +1.4%    +64.7%      0.11      0.11    +50.0%
         queens          +0.8%     -0.2%      0.02      0.02     +0.0%
           rfib          +1.7%    +42.8%      0.03      0.03     +0.0%
            tak          +0.9%    +12.0%      0.02      0.02     +0.0%
   wheel-sieve1          +1.4%    +66.6%     -4.4%     -4.3%    -17.6%
   wheel-sieve2          +1.4%     +0.0%     -1.1%     -2.8%     +0.0%
           x2n1         +10.3%    +41.7%      0.01      0.01   +200.0%
--------------------------------------------------------------------------------
            Min          -0.1%     -1.2%     -4.4%     -4.8%    -17.6%
            Max         +18.6%    +66.6%   +102.1%    +86.2%   +300.0%
 Geometric Mean          +3.3%    +25.6%    +19.5%    +17.2%    +23.0%
 
Maybe your machine is too fast for nofib-analyse to include exp3_8.
 
Allocations
-------------------------------------------------------------------------------
        Program            log-7.0.4       log-7.6.2
-------------------------------------------------------------------------------
     bernouilli            303890616           +0.2%
         exp3_8            389023528          +53.7%
    gen_regexps               304768           +3.9%
      integrate            546206856          +39.0%
          kahan            700842656          +41.8%
      paraffins             56201680           -1.2%
         primes             65899520          +64.7%
         queens             17387888           -0.2%
           rfib                81176          +42.8%
            tak                94408          +12.0%
   wheel-sieve1             14620056          +66.6%
   wheel-sieve2             88734064           +0.0%
           x2n1              2491928          +41.7%
        -1 s.d.                -----           +3.0%
        +1 s.d.                -----          +53.2%
        Average                -----          +25.6%
Run Time
-------------------------------------------------------------------------------
        Program            log-7.0.4       log-7.6.2
-------------------------------------------------------------------------------
     bernouilli                 0.28           +7.2%
         exp3_8                 0.21          +55.4%
    gen_regexps                 0.00            0.00
      integrate                 0.34         +110.2%
          kahan                 1.07           +8.2%
      paraffins                 0.22           -3.6%
         primes                 0.10            0.11
         queens                 0.02            0.02
           rfib                 0.03            0.03
            tak                 0.01            0.02
   wheel-sieve1                 0.68           -4.0%
   wheel-sieve2                 0.37           -0.2%
           x2n1                 0.00            0.01
        -1 s.d.                -----           -9.3%
        +1 s.d.                -----          +57.7%
        Average                -----          +19.6%

And here are the results using the "mode=slow" Nofib option. Only bernouilli and gen_regexps do not have SLOW_OPTS defined in their Makefile. It's odd that gen_regexps shows such drastic change then...
 
--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
     bernouilli          +3.3%     +0.2%     +6.7%     +7.4%     +0.0%
         exp3_8          +1.1%    +68.2%    +20.2%    +20.3%   +100.0%
    gen_regexps         +18.6%     -1.2%     -6.3%     -6.0%     +0.0%
      integrate          -0.1%    +39.0%   +114.3%   +104.9%     +3.9%
          kahan          +1.7%    +41.9%     +7.5%     +7.5%     +0.0%
      paraffins          +1.3%     -1.2%     -3.3%     -2.7%     +0.3%
         primes          +1.4%    +57.9%     +0.8%     +1.0%     +0.0%
         queens          +0.8%     -0.2%     -1.7%     -2.0%     +0.0%
           rfib          +1.7%    +42.8%     +6.3%     +6.0%     +0.0%
            tak          +0.9%    +12.0%     -2.4%     -2.5%     +0.0%
   wheel-sieve1          +1.4%    +99.2%     -3.4%     -3.4%    +58.8%
   wheel-sieve2          +1.4%     -0.1%     -3.6%     -3.8%     +0.0%
           x2n1         +10.3%    +43.1%      0.13      0.13  +1300.0%
--------------------------------------------------------------------------------
            Min          -0.1%     -1.2%     -6.3%     -6.0%     +0.0%
            Max         +18.6%    +99.2%   +114.3%   +104.9%  +1300.0%
 Geometric Mean          +3.3%    +27.4%     +8.2%     +7.8%    +34.3%
 
Allocations
-------------------------------------------------------------------------------
        Program       log-slow-7.0.4  log-slow-7.6.2
-------------------------------------------------------------------------------
     bernouilli            303890616           +0.2%
         exp3_8           3500234960          +68.2%
    gen_regexps            780759064           -1.2%
      integrate           1092338624          +39.0%
          kahan           2797648296          +41.9%
      paraffins            363166288           -1.2%
         primes            861820872          +57.9%
         queens            569243336           -0.2%
           rfib                81488          +42.8%
            tak                94408          +12.0%
   wheel-sieve1             24134568          +99.2%
   wheel-sieve2            160800936           -0.1%
           x2n1             19375720          +43.1%
        -1 s.d.                -----           +1.2%
        +1 s.d.                -----          +60.4%
        Average                -----          +27.4%
 
Run Time
-------------------------------------------------------------------------------
        Program       log-slow-7.0.4  log-slow-7.6.2
-------------------------------------------------------------------------------
     bernouilli                 0.28           +6.7%
         exp3_8                 3.39          +20.2%
    gen_regexps                 1.90           -6.3%
      integrate                 0.68         +114.3%
          kahan                 4.35           +7.5%
      paraffins                 1.93           -3.3%
         primes                 1.51           +0.8%
         queens                 0.72           -1.7%
           rfib                 0.31           +6.3%
            tak                 1.58           -2.4%
   wheel-sieve1                 2.28           -3.4%
   wheel-sieve2                 0.77           -3.6%
           x2n1                 0.04            0.13
        -1 s.d.                -----          -12.9%
        +1 s.d.                -----          +34.3%
        Average                -----           +8.2%


On Tue, Feb 12, 2013 at 3:17 AM, Johan Tibell <johan.tibell@gmail.com> wrote:
Hi Nicolas!

I tried to reproduce the difference between 7.0.4 and 7.6.2 on the exp3_8, wheel-sieve1, and primes and couldn't get the same percent difference as you. We need to reconcile these differences somehow. Lets start with more exact machine specs. I have a:

$ cat /proc/cpuinfo
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping : 9
microcode : 0x12
cpu MHz : 1600.000
cache size : 8192 KB
...

$ uname -a
Linux johantibell.com 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

And GHC versions:

$ ghc-7.0.4 --info
 [("Project name","The Glorious Glasgow Haskell Compilation System")
 ,("Project version","7.0.4")
 ,("Booter version","6.12.1")
 ,("Stage","2")
 ,("Build platform","x86_64-unknown-linux")
 ,("Host platform","x86_64-unknown-linux")
 ,("Target platform","x86_64-unknown-linux")
 ,("Have interpreter","YES")
 ,("Object splitting","YES")
 ,("Have native code generator","YES")
 ,("Have llvm code generator","YES")
 ,("Support SMP","YES")
 ,("Unregisterised","NO")
 ,("Tables next to code","YES")
 ,("RTS ways","l debug  thr thr_debug thr_l thr_p  dyn debug_dyn thr_dyn thr_debug_dyn")
 ,("Leading underscore","NO")
 ,("Debug on","False")
 ,("LibDir","/usr/local/lib/ghc-7.0.4")
 ,("Global Package DB","/usr/local/lib/ghc-7.0.4/package.conf.d")
 ,("C compiler flags","[\"-fno-stack-protector\"]")
 ,("Gcc Linker flags","[]")
 ,("Ld Linker flags","[]")
 ]

$ ghc-7.6.2 --info
 [("Project name","The Glorious Glasgow Haskell Compilation System")
 ,("GCC extra via C opts"," -fwrapv")
 ,("C compiler command","/usr/bin/gcc")
 ,("C compiler flags"," -fno-stack-protector ")
 ,("ar command","/usr/bin/ar")
 ,("ar flags","q")
 ,("ar supports at file","@ArSupportsAtFile@")
 ,("touch command","touch")
 ,("dllwrap command","/bin/false")
 ,("windres command","/bin/false")
 ,("perl command","/usr/bin/perl")
 ,("target os","OSLinux")
 ,("target arch","ArchX86_64")
 ,("target word size","8")
 ,("target has GNU nonexec stack","True")
 ,("target has .ident directive","True")
 ,("target has subsections via symbols","False")
 ,("LLVM llc command","llc")
 ,("LLVM opt command","opt")
 ,("Project version","7.6.2")
 ,("Booter version","7.4.1")
 ,("Stage","2")
 ,("Build platform","x86_64-unknown-linux")
 ,("Host platform","x86_64-unknown-linux")
 ,("Target platform","x86_64-unknown-linux")
 ,("Have interpreter","YES")
 ,("Object splitting supported","YES")
 ,("Have native code generator","YES")
 ,("Support SMP","YES")
 ,("Unregisterised","NO")
 ,("Tables next to code","YES")
 ,("RTS ways","l debug  thr thr_debug thr_l thr_p dyn debug_dyn thr_dyn thr_debug_dyn")
 ,("Leading underscore","NO")
 ,("Debug on","False")
 ,("LibDir","/usr/local/lib/ghc-7.6.2")
 ,("Global Package DB","/usr/local/lib/ghc-7.6.2/package.conf.d")
 ,("Gcc Linker flags","[\"-Wl,--hash-size=31\",\"-Wl,--reduce-memory-overheads\"]")
 ,("Ld Linker flags","[\"--hash-size=31\",\"--reduce-memory-overheads\"]")
 ]

I ran the benchmarks by running e.g.:

$ cd nofib/imaginary/sieve-wheel1
$ make clean && make boot WithNofibHc=ghc-${VERSION} && make WithNofibHc=ghc-${VERSION}

Could you please try to run the "imaginary" benchmarks using exactly these commands and report the difference you see between 7.0.4 and 7.6.2. Here's what I see. 7.0.4 vs 7.6.2:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
     bernouilli          +3.3%     +0.2%      0.12      0.13     +0.0%
         exp3_8          +1.1%    +53.7%      0.14      0.14   +300.0%
    gen_regexps         +18.7%     +3.9%      0.00      0.00     +0.0%
      integrate          -0.1%    +39.0%      0.21      0.23     +0.0%
          kahan          +1.7%    +98.6%     +9.9%     +7.3%     +0.0%
      paraffins          +1.3%     -1.2%      0.06      0.08     +0.0%
         primes          +1.4%    +64.7%      0.04      0.05    +50.0%
         queens          +0.8%     -0.5%      0.02      0.02     +0.0%
           rfib          +1.7%    +42.8%      0.02      0.02     +0.0%
            tak          +0.9%    +12.0%      0.01      0.01     +0.0%
   wheel-sieve1          +0.8%    +66.6%     -4.6%     -5.8%    -12.5%
   wheel-sieve2          +0.9%     +0.0%      0.12      0.13     +0.0%
           x2n1         +10.3%    +87.3%      0.00      0.01   +200.0%
--------------------------------------------------------------------------------
            Min          -0.1%     -1.2%     -4.6%     -5.8%    -12.5%
            Max         +18.7%    +98.6%     +9.9%     +7.3%   +300.0%
 Geometric Mean          +3.2%    +31.7%     +2.4%     +0.5%    +23.6%

-- Johan