
PArrays.$w$snewPArray = \ ww :: PrelGHC.Int# w :: PrelBase.Int -> case PrelGHC.newIntArray# @ PrelGHC.RealWorld ww PrelGHC.realWorld# of wild { (# s2#, mba# #) -> let { eta :: PrelBase.Int __A 0 __C eta = PrelBase.$wI# ww } in case PrelGHC.-# ww 1 of sat { __DEFAULT -> case PrelGHC.># 0 sat of wild1 { PrelBase.True -> PArrays.$wPArray @ PrelBase.Int eta (__coerce PrelGHC.ByteArray# mba#); PrelBase.False -> case w of w1 { PrelBase.I# ww1 -> case PrelGHC.writeIntArray# @ PrelGHC.RealWorld mba# 0 ww1 s2# of s2#1 { __DEFAULT -> case PrelGHC.-# ww 1 of wild2 { 0 -> PArrays.$wPArray @ PrelBase.Int eta (__coerce PrelGHC.ByteArray# mba#); __DEFAULT -> let { a4 :: (PArrays.PArray PrelBase.Int) __A 0 __C a4 = PArrays.$wPArray @ PrelBase.Int eta (__coerce PrelGHC.ByteArray# mba#) } in __letrec { $wgo :: (PrelGHC.Int# -> PrelGHC.State# PrelGHC.RealWorld -> (PrelGHC.State# PrelGHC.RealWorld, PArrays.PArray PrelBase.Int)) __A 2 __C $wgo = \ w2 :: PrelGHC.Int# w3 :: (PrelGHC.State# PrelGHC.RealWorld) -> case PrelGHC.writeIntArray# @ PrelGHC.RealWorld mba# w2 ww1 w3 of s2#2 { __DEFAULT -> case PrelGHC.-# ww 1 of sat1 { __DEFAULT -> case PrelGHC.==# w2 sat1 of wild11 { PrelBase.True -> (# s2#2, a4 #); PrelBase.False -> case PrelGHC.+# w2 1 of sat2 { __DEFAULT -> $wgo sat2 s2#2 } } } }; } in case $wgo 1 s2#1 of wild3 { (# ds, r #) -> r } } } } } } } | However, we came across one problem (read, lack of | optimisation on GHC's part), which leads to tedious | duplication of a lot of code in our array library. | Basically, GHC does not recognise for tail recursive | functions when certain arguments (accumulators maintained in | a loop) can be unboxed. This leads to massive overheads in | our code. Currently, we circumvent the inefficiency by | having manually specialised versions of the loops for | different accumulator types and using RULES to select them | where appropriate (based on the type information). I will | send you some example code illustrating the problem soon. Yes please. Meanwhile, in the head, if you specify -O2 you'll get better code for some of your loops, namely the ones that repeatedly evaluate a free variable every time round the loop. The loop is duplicated and everything is nice. Here's a tiny example: g :: Int -> Int g x = let h 0 = 0 h y = if x==0 then y else h (y+1) in h 88 This now compiles to: LC.g = \ x :: PrelBase.Int -> case x of wild { PrelBase.I# x1 -> case x1 of wild1 { 0 -> PrelBase.$wI# 88; __DEFAULT -> __letrec { $wh :: (PrelGHC.Int# -> PrelBase.Int) __A 1 __C $wh = \ ww :: PrelGHC.Int# -> case ww of wild2 { 0 -> LC.lit; __DEFAULT -> case PrelGHC.+# wild2 1 of sat { __DEFAULT -> $wh sat } }; } in $wh 89 } } You mentioned something like this before. The amount of code duplication is controlled by -fliberate-case-threshold20 (say) The default is set pretty low. And it all only happens with -O2. All in the HEAD. The pass is call simplCore/LiberateCase. Simon

Simon Peyton-Jones
Meanwhile, in the head, if you specify -O2 you'll get better code for some of your loops, namely the ones that repeatedly evaluate a free variable every time round the loop. The loop is duplicated and everything is nice. [..] You mentioned something like this before.
The amount of code duplication is controlled by -fliberate-case-threshold20 (say) The default is set pretty low. And it all only happens with -O2.
It works nicely :-) In what kind of unit is the number after the -fliberate-case-threshold option. I really need to use 20 for it to work on the current version of the array code (where the loop that is to be optimised is much more involved than the initial example that I posted). Thanks, Manuel

Simon Peyton-Jones
| However, we came across one problem (read, lack of | optimisation on GHC's part), which leads to tedious | duplication of a lot of code in our array library. | Basically, GHC does not recognise for tail recursive | functions when certain arguments (accumulators maintained in | a loop) can be unboxed. This leads to massive overheads in | our code. Currently, we circumvent the inefficiency by | having manually specialised versions of the loops for | different accumulator types and using RULES to select them | where appropriate (based on the type information). I will | send you some example code illustrating the problem soon.
Yes please.
I have found a way of rephrasing the definition so that it is properly optimised by GHC. However, I think, it should be possible to do this automatically and it is maybe not unlike the optimisation done by simplCore/LiberateCase. The code I would like to write is, for example, the following import PrelGHC import PrelBase import PrelST fill :: MutableByteArray# s -> (acc -> Int) -> (acc -> acc) -> Int -> acc -> ST s acc {-# INLINE fill #-} fill mba# f g (I# n#) start = fill0 0# start where fill0 i# acc | i# ==# n# = return acc | otherwise = do writeIntArray mba# (I# i#) (f acc) fill0 (i# +# 1#) (g acc) writeIntArray :: MutableByteArray# s -> Int -> Int -> ST s () {-# INLINE writeIntArray #-} writeIntArray mba# (I# i#) (I# e#) = ST $ \s# -> case writeIntArray# mba# i# e# s# of {s2# -> (# s2#, () #)} foo mba# n = fill mba# id (+1) 1000 0 The interesting part is the handling of the accumulator. After inlining `fill' into `foo', it becomes obvious that the accumulator can be maintained as an unboxed integer. Unfortunately, it is not obvious to GHC, which generates the following (this is just the inlined `fill0' loop): __letrec { $wfill0 :: (PrelGHC.Int# -> PrelBase.Int -> PrelGHC.State# s -> (PrelGHC.State# s, PrelBase.Int)) __A 3 __C $wfill0 = \ w2 :: PrelGHC.Int# w3 :: PrelBase.Int w4 :: (PrelGHC.State# s) -> case w2 of wild { 1000 -> (# w4, w3 #); __DEFAULT -> case w3 of wild1 { PrelBase.I# e# -> case PrelGHC.writeIntArray# @ s w wild e# w4 of s2# { __DEFAULT -> case PrelGHC.+# e# 1 of a { __DEFAULT -> let { sat :: PrelBase.Int __A 0 __C sat = PrelBase.$wI# a } in case PrelGHC.+# wild 1 of sat1 { __DEFAULT -> $wfill0 sat1 sat s2# } } } } }; } in $wfill0 0 Test.lit w1 The accumulator (w3) is unboxed immediately before the writeIntArray# and its next value put into a box (sat) - only to be unboxed immediately again in the next loop iteration. This would make perfect sense when the definition of `foo' were foo mba# n = fill mba# id plus 1000 0 where plus 0 = error "Die horribly" plus x = x + 1 I also appreciate that, if the loop is executed zero times, the initial value of `acc' is not demanded. But this is not much different to the case handled by simplCore/LiberateCase. And indeed with a little help, GHC generates much better code. In the following, I rewrote `fill' to explicitly test for input values that make the loop execute zero times: fill mba# f g (I# 0#) start = return start fill mba# f g (I# n#) start = fill0 0# start where fill0 i# acc = do writeIntArray mba# (I# i#) (f acc) let i'# = i# +# 1# acc' = g acc if i'# ==# n# then return acc' else fill0 i'# acc' Now, `acc' is guaranteed to be used in each invocation of `fill0' and GHC generates: __letrec { $wfill0 :: (PrelGHC.Int# -> PrelGHC.Int# -> PrelGHC.State# s -> (PrelGHC.State# s, PrelBase.Int)) __A 3 __C $wfill0 = \ w2 :: PrelGHC.Int# ww :: PrelGHC.Int# w3 :: (PrelGHC.State# s) -> case PrelGHC.writeIntArray# @ s w w2 ww w3 of s2# { __DEFAULT -> case PrelGHC.+# w2 1 of wild { 1000 -> case PrelGHC.+# ww 1 of a { __DEFAULT -> let { a1 :: PrelBase.Int __A 0 __C a1 = PrelBase.$wI# a } in (# s2#, a1 #) }; __DEFAULT -> case PrelGHC.+# ww 1 of sat { __DEFAULT -> $wfill0 wild sat s2# } } }; } in $wfill0 0 0 w1 A nice tight loop. However, the initial version of `fill' is the more natural one to write. I think, it should be possible to derive the second version (are at least a similar version) automatically from the initial code. The derivation might go roughly as follows: fill mba# f g (I# n#) start = fill0 0# start where fill0 i# acc = case i# ==# n# of True -> return acc False -> do writeIntArray mba# (I# i#) (f acc) fill0 (i# +# 1#) (g acc) === {pull case out of fill0 (ie, partial unfolding)} fill mba# f g (I# n#) start = case 0# ==# n# of True -> return start False -> fill0 0# start where fill0 i# acc = do writeIntArray mba# (I# i#) (f acc) case (i# +# 1#) ==# n# of True -> return (g acc) False -> fill0 (i# +# 1#) (g acc) This is essentially the recursive variant of a well known law for while loops: while p do q; === if p then do q while p; Wouldn't this actually subsume the liberate case rule? f = \ t -> case v of V a b -> a : f t === {pull out the case} f = case v of V a b -> f = \ t -> a : case v of V a b -> f t === {simplification} f = case v of V a b -> f = \ t -> a : f t This might be more complicated to implement, as we only partially unfold the recursive function, but it also has more scope. What do you think? Cheers, Manuel PS: All Core code was generated with the HEAD from two days ago.
participants (2)
-
Manuel M. T. Chakravarty
-
Simon Peyton-Jones