
Simon Marlow
Not so much code size, but data size (heap size, to be more precise).
Of course. There was some talk about storing tags in pointers for 6.8, I couldn't find the reference, but I wonder if that would help my situation?
It would be interesting to know how much time is spent in the GC - run the program with +RTS -sstderr.
MUT time decreases a bit (131 to 127s) for x86_64, but GC time increases a lot (98 to 179s). i686 version: ---------------------------------------- 94,088,199,152 bytes allocated in the heap 22,294,740,756 bytes copied during GC (scavenged) 2,264,823,784 bytes copied during GC (not scavenged) 124,747,644 bytes maximum residency (4138 sample(s)) 179962 collections in generation 0 ( 67.33s) 4138 collections in generation 1 ( 30.92s) 248 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 131.53s (133.03s elapsed) GC time 98.25s (100.13s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 229.78s (233.16s elapsed) %GC time 42.8% (42.9% elapsed) Alloc rate 715,345,865 bytes per MUT second Productivity 57.2% of total user, 56.4% of total elapsed ---------------------------------------- x86_64 version: ---------------------------------------- 173,790,326,352 bytes allocated in the heap 59,874,348,560 bytes copied during GC (scavenged) 5,424,298,832 bytes copied during GC (not scavenged) 247,477,744 bytes maximum residency (9856 sample(s)) 331264 collections in generation 0 (111.51s) 9856 collections in generation 1 ( 67.80s) 582 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 127.20s (127.76s elapsed) GC time 179.32s (179.63s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 306.52s (307.39s elapsed) %GC time 58.5% (58.4% elapsed) Alloc rate 1,366,233,874 bytes per MUT second Productivity 41.5% of total user, 41.4% of total elapsed ---------------------------------------- I've also added results from the 64 bit ghc-6.8.20071011 binary snapshot, which shows some nice improvements, with one benchmark improving by 30%(!). ---------------------------------------- 151,807,589,712 bytes allocated in the heap 50,687,462,360 bytes copied during GC (scavenged) 4,472,003,520 bytes copied during GC (not scavenged) 256,532,480 bytes maximum residency (6805 sample(s)) 289342 collections in generation 0 ( 89.30s) 6805 collections in generation 1 ( 60.26s) 602 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 83.79s ( 84.36s elapsed) GC time 149.57s (151.10s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 233.35s (235.47s elapsed) %GC time 64.1% (64.2% elapsed) Alloc rate 1,811,779,785 bytes per MUT second Productivity 35.9% of total user, 35.6% of total elapsed ----------------------------------------
I'll add some more benchmarks
And I did. Below is a bit more detail from the log. The "rc hash counts" traverse a bytestring, hashing fixed-size words into Integers. As you can see, I haven't yet gotten the SPECIALIZE pragma to work correctly :-). The "global alignment" is the previous test, performing global (Needleman-Wunsch) alignment on pairs of sequences of length 100 (short) or 1000 (long), implementing the dynamic programming matrix as a list of lists. ==================== Start:Fri Oct 12 08:48:36 CEST 2007 Linux nmd9999 2.6.20-16-generic #2 SMP Fri Aug 31 00:55:27 UTC 2007 i686 GNU/Linux ghc 6.6 --- Sequence bench --- rc hash counts int (8) ..... OK, passed 10 tests, CPU time: 34.526157s rc hash counts int (16) ..... OK, passed 10 tests, CPU time: 34.746172s rc hash counts (16) ..... OK, passed 10 tests, CPU time: 34.642164s rc hash counts (32) ..... OK, passed 10 tests, CPU time: 35.378212s Sequence bench totals, CPU time: 139.292705s, wall clock: 139 secs --- Alignment bench --- global alignment, short ..... OK, passed 10 tests, CPU time: 2.696168s global alignment, long ...... OK, passed 10 tests, CPU time: 90.481655s Alignment bench totals, CPU time: 93.177823s, wall clock: 94 secs Total for all tests, CPU time: 232.474528s, wall clock: 233 secs End:Fri Oct 12 08:52:29 CEST 2007 ==================== Start:Fri Oct 12 09:52:33 CEST 2007 Linux nmd9999.imr.no 2.6.22-13-generic #1 SMP Thu Oct 4 17:52:26 GMT 2007 x86_64 GNU/Linux ghc 6.6.1 --- Sequence bench --- rc hash counts int (8) ..... OK, passed 10 tests, CPU time: 36.634289s rc hash counts int (16) ..... OK, passed 10 tests, CPU time: 36.590286s rc hash counts (16) ..... OK, passed 10 tests, CPU time: 36.946309s rc hash counts (32) ..... OK, passed 10 tests, CPU time: 37.402338s Sequence bench totals, CPU time: 147.577222s, wall clock: 148 secs --- Alignment bench --- global alignment, short ..... OK, passed 10 tests, CPU time: 3.564223s global alignment, long ...... OK, passed 10 tests, CPU time: 156.101756s Alignment bench totals, CPU time: 159.665979s, wall clock: 159 secs Total for all tests, CPU time: 307.247201s, wall clock: 307 secs End:Fri Oct 12 09:57:40 CEST 2007 ==================== Start:Fri Oct 12 10:51:27 CEST 2007 Linux nmd9999.imr.no 2.6.22-13-generic #1 SMP Thu Oct 4 17:52:26 GMT 2007 x86_64 GNU/Linux ghc 6.8.0.20071011 --- Sequence bench --- rc hash counts int (8) ..... OK, passed 10 tests, CPU time: 22.773423s rc hash counts int (16) ..... OK, passed 10 tests, CPU time: 22.657416s rc hash counts (16) ..... OK, passed 10 tests, CPU time: 22.513407s rc hash counts (32) ..... OK, passed 10 tests, CPU time: 23.009438s Sequence bench totals, CPU time: 90.953684s, wall clock: 91 secs --- Alignment bench --- global alignment, short ..... OK, passed 10 tests, CPU time: 3.168198s global alignment, long ...... OK, passed 10 tests, CPU time: 140.808799s Alignment bench totals, CPU time: 143.976997s, wall clock: 144 secs Total for all tests, CPU time: 234.930681s, wall clock: 235 secs End:Fri Oct 12 10:55:23 CEST 2007 -k -- If I haven't seen further, it is by standing in the footprints of giants