
Bulat Ziganshin wrote:
taking this all into account, i propose the following:
1) GHC faqs already suggests using of RTS -A/-H option to speed up compilation. i propose to move this suggestion right to compiler itself and add the line
char *ghc_rts_opts = "-A10m";
to GHC 6.4 sources
Do you have some evidence that -A10m is a good default? Better than -A6m, or -A16m, for example? GHC currently runs with -H6m by default. I'm happy to change this (in 6.4 only, I suppose), if you have evidence that we're sitting in a bad place on the curve. It's a space/time tradeoff, as usual. Typically we've been quite conservative with space usage in the past, this is why the defaults tend to be quite low.
2) i propose to write "L2 cache size detection" code and use it in GHC 6.6 RTS to setup initial value of "-A" option. in order to allow program tune itself to any cpu architecture, with cache sizes ranging from 128kb to 4mb. this will allow low-level cpus to run significantly faster on some algorithms (up to 2x, as i said above) and can give 5-10% speedup for high-level cpus, that is also not so bad :)
That sounds like a good plan. I've been experimenting with PAPI recently (http://icl.cs.utk.edu/papi/) which can tell you the size of your caches, but it requires kernel patches. Cheers, Simon