
2008/6/23 Dan Doel
On Monday 23 June 2008, Isaac Dupree wrote:
there's no chance for the lower-level near code generation to reverse-SAT to eliminate the heap usage? (which would obviously be a different optimization that might be useful in other ways too, if it could be made to work) (did someone say that -fvia-C with gcc -O3 managed to do that sometimes?)
I said something similar, but I don't actually know the details. What I have is a SAT and non-SAT version of a benchmark:
In the SAT version, the native code generator is noticeably slower than -fvia-c -optc-O3, so the latter is doing something right. In the non-SAT version, -fvia-c is somewhat faster than the SAT version, but the NCG is just as fast. I don't know what's going on on the C end in the first case that makes it better, though (heap allocation is the same (low) in all cases; via-c doesn't fix large heap allocation when it happens in my experience).
I've tested this on my own machine (i386-apple-darwin) and I can't replicate it. For me (with 6.8.2), the SATed version is faster by almost 40%, and there is no essentially no difference between the NCG and GCC (GCC is maybe a fraction faster). In any event GHC will not automatically SAT this function because it has only one static argument. Indeed, I'm not quite sure why it should be faster with SAT at all, and my assembly-fu is still too weak to be able to work it out from the code generator output.
As an aside, I finally got around to compiling 6.9 from a day or so ago, and noticed that one of the functions I'd mentioned earlier does get SATed to the bad version (which causes lots of heap allocation in the program) because it has two static arguments.
I devoted yesterday entirely to SAT and came up with some refinements to my earlier heuristics which shave a couple of percent of allocations and runtime off of nofib compared with the existing ones, and also resolve this problem. I found that my tail call detection from the day before was a bit broken. Now that it is fixed I use it to avoid SATing functions which make tail calls, but with the important wrinkle that if the function has static function parameters we should SAT it anyway because this lets us do much more inlining and is a bigger win than avoiding allocation. This criterion avoids the heap allocation you are experiencing in your particular example. I've also added rule generation to rewrite instances of functions that are SATed purely because they have static function parameters back into the non-SATed versions if they do not experience inlinig. This prevents the worst cases that I got before when always SATing functions with static function parameters. What's more, I've found that most if not all SAT opportunities are found early in the pipeline by these new criteria, so we can get away with just one SAT pass instead of the two I reported earlier. All in all, these developments are pretty promising and I suspect I'll be able to integrate them into HEAD in time for 6.10, which will hopefully alleviate some of your worries about the efficacy of SAT. Cheers, Max