[11/16] SBM: Graphs for hand-tweaked assembly benchmarks

This report compares the hand-tweaked assembly programs with the original untweaked programs on two vastly different microarchitectures. This is the command I ran to generate the report: EXCLUDE='(xxxx|-bsl|chunk|count|acc-[23]|fold|lenfil|^c/)' \ tools/merge.pl \ ghc-armada-thorough-6.9.tgz \ ghc-thorough-6.9.tgz \
xx
I cut out the memory sections manually since we've already seen them and inserted a few newlines for grouping purposes. The first one should note is that not all tweaks are better than the originals! The second is that the sequence of tweaks is not quite monotonically decreasing in run-time. The improvements don't really start until -e on the Athlon64 and -f on both. Not until then have the load pressure been sufficiently relieved on the L1 cache that the code actually runs faster. Note also how the two microarchitectures seem to have plateaus in different places. The Athlon64 seems to have the number 3 built into its silicon (efg, jkl, mno) which fits very well with what we know about it from AMD's documentation (the front end splits the instructions up into smaller pieces which then get distributed to three different "pipelines", each with its own out-of-order execution engine). The Pentium III seems to have trouble with the simple MMX code but does very well with the more advanced MMX code that keeps 8 space counters in a single MMX register for many iterations. The code I used to add those counters horizontally is the same in both -q and -r. Perhaps operations on both MMX and normal registers are slow? Loop unrolling (-s) doesn't seem to matter, in this case. -Peter ls-search ghc 6.9.20071119 Pentium III (Coppermine) 596.932 MHz TESTKIND=THOROUGH SUFFIX= charybdis ghc 6.9.20071119 AMD Athlon(tm) 64 Processor 3000+ 2009.160 MHz TESTKIND=THOROUGH SUFFIX= Time (byte counting) std -------------------- avg dev slack hs/byte-bs----acc: 3.274 1‰ 0.1 ███████████████████████████ | -- 0.705 7‰ 0.1 █████████████████████▋ | hand/byte-bs----acc-a: 3.511 1‰ 0.0 ████████████████████████████▉ | -- 0.639 2‰ 0.2 ███████████████████▋ | hand/byte-bs----acc-b: 1.998 2‰ 0.1 ████████████████▌ | -- 0.414 2‰ 0.5 ████████████▊ | hand/byte-bs----acc-c: 1.876 2‰ 0.1 ███████████████▌ | -- 0.414 3‰ 0.2 ████████████▊ | hand/byte-bs----acc-d: 1.876 1‰ 0.1 ███████████████▌ | -- 0.415 3‰ 0.2 ████████████▊ | Time (space counting) std --------------------- avg dev slack hs/space-bs-c8-acc-1: 4.318 1‰ 0.0 ███████████████████████████████████▋ | -- 1.145 1‰ 0.2 ███████████████████████████████████▏ | hand/space-bs-c8-acc-1-a: 4.318 1‰ 0.0 ███████████████████████████████████▋ | -- 1.177 2‰ 0.3 ████████████████████████████████████▏| hand/space-bs-c8-acc-1-b: 4.331 1‰ 0.0 ███████████████████████████████████▋ | -- 1.104 1‰ 0.2 █████████████████████████████████▉ | hand/space-bs-c8-acc-1-c: 4.492 1‰ 0.1 █████████████████████████████████████| -- 1.207 1‰ 0.3 █████████████████████████████████████| hand/space-bs-c8-acc-1-d: 4.354 1‰ 0.0 ███████████████████████████████████▉ | -- 1.191 1‰ 0.2 ████████████████████████████████████▌| hand/space-bs-c8-acc-1-e: 4.424 0‰ 0.1 ████████████████████████████████████▌| -- 0.937 1‰ 0.2 ████████████████████████████▊ | hand/space-bs-c8-acc-1-f: 4.164 1‰ 0.0 ██████████████████████████████████▎ | -- 0.921 1‰ 0.2 ████████████████████████████▎ | hand/space-bs-c8-acc-1-g: 4.309 1‰ 0.1 ███████████████████████████████████▌ | -- 0.927 2‰ 0.4 ████████████████████████████▍ | hand/space-bs-c8-acc-1-h: 4.202 1‰ 0.1 ██████████████████████████████████▋ | -- 0.886 2‰ 0.2 ███████████████████████████▏ | hand/space-bs-c8-acc-1-i: 3.820 1‰ 0.1 ███████████████████████████████▌ | -- 0.803 3‰ 0.4 ████████████████████████▋ | hand/space-bs-c8-acc-1-j: 3.472 1‰ 0.0 ████████████████████████████▋ | -- 0.706 2‰ 0.1 █████████████████████▋ | hand/space-bs-c8-acc-1-k: 3.474 1‰ 0.0 ████████████████████████████▋ | -- 0.705 1‰ 0.0 █████████████████████▋ | hand/space-bs-c8-acc-1-l: 3.498 1‰ 0.1 ████████████████████████████▉ | -- 0.710 2‰ 0.1 █████████████████████▊ | hand/space-bs-c8-acc-1-m: 3.397 1‰ 0.1 ████████████████████████████ | -- 0.642 6‰ 0.3 ███████████████████▋ | hand/space-bs-c8-acc-1-n: 3.373 1‰ 0.0 ███████████████████████████▊ | -- 0.636 4‰ 0.5 ███████████████████▌ | hand/space-bs-c8-acc-1-o: 3.118 1‰ 0.1 █████████████████████████▋ | -- 0.626 2‰ 0.0 ███████████████████▎ | hand/space-bs-c8-acc-1-p: 2.935 2‰ 0.0 ████████████████████████▏ | -- 0.565 3‰ 0.4 █████████████████▍ | hand/space-bs-c8-acc-1-q: 3.477 1‰ 0.1 ████████████████████████████▋ | -- 0.418 6‰ 0.7 ████████████▉ | hand/space-bs-c8-acc-1-r: 1.674 1‰ 0.1 █████████████▊ | -- 0.334 5‰ 0.6 ██████████▎ | hand/space-bs-c8-acc-1-s: 1.627 1‰ 0.2 █████████████▍ | -- 0.335 4‰ 0.9 ██████████▎ |
participants (1)
-
Peter Firefly Brodersen Lund