
On Monday 23 June 2008, Isaac Dupree wrote:
there's no chance for the lower-level near code generation to reverse-SAT to eliminate the heap usage? (which would obviously be a different optimization that might be useful in other ways too, if it could be made to work) (did someone say that -fvia-C with gcc -O3 managed to do that sometimes?)
I said something similar, but I don't actually know the details. What I have is a SAT and non-SAT version of a benchmark: ---- SAT ---- bench :: Int -> Int -> ST s () bench (I# k) (I# n) = ST go where go s = case sizeOf (0 :: Int) of { I# w -> case newByteArray# (n *# w) s of { (# s, arr #) -> case fill arr n s of { s -> go' arr k s } } } go' arr = go'' where go'' 0# s = (# s, () #) go'' k s = case reverse arr 0# (n -# 1#) s of { s -> go'' (k -# 1#) s } reverse :: MutableByteArray# s -> Int# -> Int# -> State# s -> State# s reverse arr = go where go i j s | i <# j = case readIntArray# arr i s of { (# s, ei #) -> case readIntArray# arr j s of { (# s, ej #) -> case writeIntArray# arr j ei s of { s -> case writeIntArray# arr i ej s of { s -> go (i +# 1#) (j -# 1#) s } } } } | otherwise = s ---- non-SAT ---- bench :: Int -> Int -> ST s () bench (I# k) (I# n) = ST go where go s = case sizeOf (0 :: Int) of { I# w -> case newByteArray# (n *# w) s of { (# s, arr #) -> case fill arr n s of { s -> go' arr k s } } } go' arr 0# s = (# s, () #) go' arr k s = case reverse arr 0# (n -# 1#) s of { s -> go' arr (k -# 1#) s } reverse :: MutableByteArray# s -> Int# -> Int# -> State# s -> State# s reverse arr i j s | i <# j = case readIntArray# arr i s of { (# s, ei #) -> case readIntArray# arr j s of { (# s, ej #) -> case writeIntArray# arr j ei s of { s -> case writeIntArray# arr i ej s of { s -> reverse arr (i +# 1#) (j -# 1#) s } } } } | otherwise = s In the SAT version, the native code generator is noticeably slower than -fvia-c -optc-O3, so the latter is doing something right. In the non-SAT version, -fvia-c is somewhat faster than the SAT version, but the NCG is just as fast. I don't know what's going on on the C end in the first case that makes it better, though (heap allocation is the same (low) in all cases; via-c doesn't fix large heap allocation when it happens in my experience). As an aside, I finally got around to compiling 6.9 from a day or so ago, and noticed that one of the functions I'd mentioned earlier does get SATed to the bad version (which causes lots of heap allocation in the program) because it has two static arguments. However, marking it NOINLINE seemed to keep it from having the negative effects. Does that pragma keep SAT from doing its thing, and if so, will that continue to be viable in cases where we decide we know better than GHC? Cheers, -- Dan