
On Friday 20 June 2008, Max Bolingbroke wrote:
Of course, if you have any suggestions for good heuristics based on your benchmarking experience then we would like to hear them! There was some discussion of this in the original ticket, http://hackage.haskell.org/trac/ghc/ticket/888, but when implementing SAT I tried out the suggestions made there without good results (though to be perfectly honest I didn't really understand the motivations behind some of the suggestions made).
Well, as I said, I was apparently being over-aggressive in my manual application of the transform, so I'm not yet the best person to ask, I suppose. :) However, if I had to pick something out of the air, I'd say this: always do SAT when the argument in question is a function. This is actually the reason I started doing it in the first place. I was working on sorting uvectors, and had what I thought was pretty good performance, and then I did SAT on the comparison function, and suddenly my code was running in half the time. Going back to my sorting (testing on introsort), SAT for the array doesn't seem to do much, but undoing the SAT for the comparison function causes heap allocation to balloon, and everything gets at least a factor of 2 slower. This also seems to match the other manual uses of SAT I've seen in the past. For instance, in the GHC libraries you'll see: foldr f = go where go z (x:xs) = f x (go z xs) go z [ ] = z because it's significantly faster, while earlier in this thread, SPJ said that: xs ++ ys = let go [ ] = ys go (z:zs) = z : go zs in go xs isn't much of a win. I don't have much data currently, but the places I *know* it's been a win have been SAT on functions, while the places I *know* it's been a loss have been SAT on unboxed data. Whether this is because SAT allows more opportunity for inlining/specialization on functions, or because copying functions is inherently more expensive, though, I don't know. I'll keep an eye open from now on, though. (On an additional tentative note, undoing SAT in the first example of my last mail seemed to eliminate the difference between the native code generator and -fvia-c -optc-O3. So whatever SAT was doing to make the code slower (no extra heap allocation, just slower code), GCC may have been fixing. My off-the-top-of-the-head guess was that passing the arguments along kept them in registers, and maybe GCC was putting the SATed arguments back into registers, but as I said, I don't currently have the expertise to verify that.) Cheers, -- Dan