
Simon Peyton-Jones
| However, we came across one problem (read, lack of | optimisation on GHC's part), which leads to tedious | duplication of a lot of code in our array library. | Basically, GHC does not recognise for tail recursive | functions when certain arguments (accumulators maintained in | a loop) can be unboxed. This leads to massive overheads in | our code. Currently, we circumvent the inefficiency by | having manually specialised versions of the loops for | different accumulator types and using RULES to select them | where appropriate (based on the type information). I will | send you some example code illustrating the problem soon.
Yes please.
I have found a way of rephrasing the definition so that it is properly optimised by GHC. However, I think, it should be possible to do this automatically and it is maybe not unlike the optimisation done by simplCore/LiberateCase. The code I would like to write is, for example, the following import PrelGHC import PrelBase import PrelST fill :: MutableByteArray# s -> (acc -> Int) -> (acc -> acc) -> Int -> acc -> ST s acc {-# INLINE fill #-} fill mba# f g (I# n#) start = fill0 0# start where fill0 i# acc | i# ==# n# = return acc | otherwise = do writeIntArray mba# (I# i#) (f acc) fill0 (i# +# 1#) (g acc) writeIntArray :: MutableByteArray# s -> Int -> Int -> ST s () {-# INLINE writeIntArray #-} writeIntArray mba# (I# i#) (I# e#) = ST $ \s# -> case writeIntArray# mba# i# e# s# of {s2# -> (# s2#, () #)} foo mba# n = fill mba# id (+1) 1000 0 The interesting part is the handling of the accumulator. After inlining `fill' into `foo', it becomes obvious that the accumulator can be maintained as an unboxed integer. Unfortunately, it is not obvious to GHC, which generates the following (this is just the inlined `fill0' loop): __letrec { $wfill0 :: (PrelGHC.Int# -> PrelBase.Int -> PrelGHC.State# s -> (PrelGHC.State# s, PrelBase.Int)) __A 3 __C $wfill0 = \ w2 :: PrelGHC.Int# w3 :: PrelBase.Int w4 :: (PrelGHC.State# s) -> case w2 of wild { 1000 -> (# w4, w3 #); __DEFAULT -> case w3 of wild1 { PrelBase.I# e# -> case PrelGHC.writeIntArray# @ s w wild e# w4 of s2# { __DEFAULT -> case PrelGHC.+# e# 1 of a { __DEFAULT -> let { sat :: PrelBase.Int __A 0 __C sat = PrelBase.$wI# a } in case PrelGHC.+# wild 1 of sat1 { __DEFAULT -> $wfill0 sat1 sat s2# } } } } }; } in $wfill0 0 Test.lit w1 The accumulator (w3) is unboxed immediately before the writeIntArray# and its next value put into a box (sat) - only to be unboxed immediately again in the next loop iteration. This would make perfect sense when the definition of `foo' were foo mba# n = fill mba# id plus 1000 0 where plus 0 = error "Die horribly" plus x = x + 1 I also appreciate that, if the loop is executed zero times, the initial value of `acc' is not demanded. But this is not much different to the case handled by simplCore/LiberateCase. And indeed with a little help, GHC generates much better code. In the following, I rewrote `fill' to explicitly test for input values that make the loop execute zero times: fill mba# f g (I# 0#) start = return start fill mba# f g (I# n#) start = fill0 0# start where fill0 i# acc = do writeIntArray mba# (I# i#) (f acc) let i'# = i# +# 1# acc' = g acc if i'# ==# n# then return acc' else fill0 i'# acc' Now, `acc' is guaranteed to be used in each invocation of `fill0' and GHC generates: __letrec { $wfill0 :: (PrelGHC.Int# -> PrelGHC.Int# -> PrelGHC.State# s -> (PrelGHC.State# s, PrelBase.Int)) __A 3 __C $wfill0 = \ w2 :: PrelGHC.Int# ww :: PrelGHC.Int# w3 :: (PrelGHC.State# s) -> case PrelGHC.writeIntArray# @ s w w2 ww w3 of s2# { __DEFAULT -> case PrelGHC.+# w2 1 of wild { 1000 -> case PrelGHC.+# ww 1 of a { __DEFAULT -> let { a1 :: PrelBase.Int __A 0 __C a1 = PrelBase.$wI# a } in (# s2#, a1 #) }; __DEFAULT -> case PrelGHC.+# ww 1 of sat { __DEFAULT -> $wfill0 wild sat s2# } } }; } in $wfill0 0 0 w1 A nice tight loop. However, the initial version of `fill' is the more natural one to write. I think, it should be possible to derive the second version (are at least a similar version) automatically from the initial code. The derivation might go roughly as follows: fill mba# f g (I# n#) start = fill0 0# start where fill0 i# acc = case i# ==# n# of True -> return acc False -> do writeIntArray mba# (I# i#) (f acc) fill0 (i# +# 1#) (g acc) === {pull case out of fill0 (ie, partial unfolding)} fill mba# f g (I# n#) start = case 0# ==# n# of True -> return start False -> fill0 0# start where fill0 i# acc = do writeIntArray mba# (I# i#) (f acc) case (i# +# 1#) ==# n# of True -> return (g acc) False -> fill0 (i# +# 1#) (g acc) This is essentially the recursive variant of a well known law for while loops: while p do q; === if p then do q while p; Wouldn't this actually subsume the liberate case rule? f = \ t -> case v of V a b -> a : f t === {pull out the case} f = case v of V a b -> f = \ t -> a : case v of V a b -> f t === {simplification} f = case v of V a b -> f = \ t -> a : f t This might be more complicated to implement, as we only partially unfold the recursive function, but it also has more scope. What do you think? Cheers, Manuel PS: All Core code was generated with the HEAD from two days ago.