Dead else branch does influence runtime?

Dear all, I am very puzzled by a program that contains an "else" branch that is never executed, but still seems to slow down the program. (When I replace it by "undefined", the resulting program runs much faster.) http://hackage.haskell.org/trac/ghc/ticket/5256 I thought it may be a type issue (the type of the else branch forces the type of the "then" branch to be more general, thus some optimization might not fire) but the types of the branches look identical. (They are generic, but the specializer should take care of that.) I am sure GHC headquarters will look at this when they find the time but perhaps there's some additional knowledge on this mailing list that might help. J.W.

On Tuesday 14 June 2011, 14:35:19, Johannes Waldmann wrote:
Dear all,
I am very puzzled by a program that contains an "else" branch that is never executed, but still seems to slow down the program. (When I replace it by "undefined", the resulting program runs much faster.) http://hackage.haskell.org/trac/ghc/ticket/5256
I thought it may be a type issue (the type of the else branch forces the type of the "then" branch to be more general, thus some optimization might not fire) but the types of the branches look identical. (They are generic, but the specializer should take care of that.)
The else branch is not dead code in the sense of 'unreachable', it's just not executed in your particular run. Therefore the compiler has to generate code for it. In the case of undefined, it's short and simple code: (case GHC.Conc.Sync.numCapabilities of _ { GHC.Types.I# x_a1zI -> case GHC.Prim.<=# x_a1zI 1 of _ { GHC.Bool.False -> GHC.Err.undefined `cast` (CoUnsafe (forall a_a1fu. a_a1fu) GHC.Base.String :: (forall a_a1fu. a_a1fu) ~ GHC.Base.String); appearing in Main.main1 - the undefined makes foldb_cap simple enough to be inlined, then V.foldl' and eff, h1 are inlined too, to become a loop on three unboxed Int#s. With id, main1 jumps to foldb_cap, which contains a lot of code for the (cap > 1)-branch, and - that's what causes the slowdown - a worker loop $s$wfoldlM'_loop_s3EE [Occ=LoopBreaker] :: GHC.Prim.Int# -> (GHC.Types.Int, GHC.Types.Int) -> (# GHC.Types.Int, GHC.Types.Int #) which uses the passed functions (thus you have no inlining of eff and h1, and a boxed tuple of boxed Int's in your worker).
I am sure GHC headquarters will look at this when they find the time but perhaps there's some additional knowledge on this mailing list that might help.
J.W.

Thanks for the analysis. So is this a problem that should be fixed in GHC? And what can I do to circumvent the problem? (Perhaps write some RULES magic?) Thanks - J.W.

How is the compiler to know the else branch is never executed at run-time?
If you do, then why is it there in your source code?
On Tue, Jun 14, 2011 at 7:35 AM, Johannes Waldmann
Thanks for the analysis.
So is this a problem that should be fixed in GHC?
And what can I do to circumvent the problem? (Perhaps write some RULES magic?)
Thanks - J.W.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- -- Regards, KC

On 06/14/2011 04:42 PM, KC wrote:
How is the compiler to know the else branch is never executed at run-time? If you do, then why is it there in your source code?
The algorithm is divide-and-conquer, and I want to create sparks as long as I have cores (capabilities), and use the linear algorithm below that. The bug report is that the linear algorithm (if called from inside my program, at the leaves of the recursion tree) is much slower than when called on its own. This bug already shows when the tree has height 0, but this is really just for the bug report - in real life, both branches will be executed. J.W.

On Tuesday 14 June 2011, 15:51:57, Daniel Fischer wrote:
On Tuesday 14 June 2011, 14:35:19, Johannes Waldmann wrote:
With id, main1 jumps to foldb_cap, which contains a lot of code for the (cap > 1)-branch, and - that's what causes the slowdown - a worker loop
$s$wfoldlM'_loop_s3EE [Occ=LoopBreaker]
:: GHC.Prim.Int#
-> (GHC.Types.Int, GHC.Types.Int) -> (# GHC.Types.Int, GHC.Types.Int #)
which uses the passed functions (thus you have no inlining of eff and h1, and a boxed tuple of boxed Int's in your worker).
I am sure GHC headquarters will look at this when they find the time but perhaps there's some additional knowledge on this mailing list that might help.
Note that you get good behaviour when you help GHC a bit, in particular a static argument transformation for the function parameters of foldb_cap allows them to be inlined and (in this case) you get the nice loop on unboxed Int#s again: foldb_cap :: ( V.Unbox a, V.Unbox b ) => Int -> b -> ( a -> b ) -> ( b -> b -> b ) -> Vector a -> b foldb_cap cp strt f g xs = work cp strt xs where work cap e s = if cap <= 1 then V.foldl' g e $ V.map f s -- replace "id" by "undefined" in the following, -- and notice a drastic decrease in runtime - -- although this branch is never executed: else id $ case V.length s of 0 -> e 1 -> f $! V.head s n -> let splitAt k v = ( V.take k v, V.drop k v ) ( s1, s2 ) = splitAt ( div n 2 ) s cap' = div cap 2 v1 = work cap' e s1 v2 = work cap' e s2 v = g v1 v2 in par v1 $ pseq v2 $ v

Superb; on how to avoid boxing.
On Tue, Jun 14, 2011 at 7:51 AM, Daniel Fischer
On Tuesday 14 June 2011, 15:51:57, Daniel Fischer wrote:
On Tuesday 14 June 2011, 14:35:19, Johannes Waldmann wrote:
With id, main1 jumps to foldb_cap, which contains a lot of code for the (cap > 1)-branch, and - that's what causes the slowdown - a worker loop
$s$wfoldlM'_loop_s3EE [Occ=LoopBreaker]
:: GHC.Prim.Int#
-> (GHC.Types.Int, GHC.Types.Int) -> (# GHC.Types.Int, GHC.Types.Int #)
which uses the passed functions (thus you have no inlining of eff and h1, and a boxed tuple of boxed Int's in your worker).
I am sure GHC headquarters will look at this when they find the time but perhaps there's some additional knowledge on this mailing list that might help.
Note that you get good behaviour when you help GHC a bit, in particular a static argument transformation for the function parameters of foldb_cap allows them to be inlined and (in this case) you get the nice loop on unboxed Int#s again:
foldb_cap :: ( V.Unbox a, V.Unbox b ) => Int -> b -> ( a -> b ) -> ( b -> b -> b ) -> Vector a -> b foldb_cap cp strt f g xs = work cp strt xs where work cap e s = if cap <= 1 then V.foldl' g e $ V.map f s -- replace "id" by "undefined" in the following, -- and notice a drastic decrease in runtime - -- although this branch is never executed: else id $ case V.length s of 0 -> e 1 -> f $! V.head s n -> let splitAt k v = ( V.take k v, V.drop k v ) ( s1, s2 ) = splitAt ( div n 2 ) s cap' = div cap 2 v1 = work cap' e s1 v2 = work cap' e s2 v = g v1 v2 in par v1 $ pseq v2 $ v
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- -- Regards, KC
participants (3)
-
Daniel Fischer
-
Johannes Waldmann
-
KC