
#13623: join points produce bad code for stream fusion -------------------------------------+------------------------------------- Reporter: choenerzs | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.2.1-rc1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * cc: lukemauer (added) * priority: normal => high * milestone: => 8.2.1 @@ -7,1 +7,1 @@ - {{{ + {{{#!hs @@ -24,1 +24,1 @@ - {{{ + {{{#!hs @@ -61,1 +61,1 @@ - two parts (xs or ys) based on a case w2_s1U2 of {Left -> ; Right ->}. + two parts (xs or ys) based on a `case w2_s1U2 of {Left -> ; Right ->}`. @@ -63,1 +63,1 @@ - {{{ + {{{#!hs New description: Below, I am generating to stream fusion streams xs and ys. Both parameterized on k l. The two streams are then concatenated. Finally I do a strict left fold. This example needs the 'vector' package but nothing else. {{{#!hs module Test where import Data.Vector.Fusion.Stream.Monadic as S foo :: Int -> Int -> IO Int foo = \i j -> S.foldl' (+) 0 $ xs i j S.++ ys i j where xs k l = S.enumFromStepN k l 2 ys k l = S.enumFromStepN k l 3 {-# Inline xs #-} {-# Inline ys #-} {-# Inline foo #-} }}} With ghc-8.0.1 I get nice core: {{{#!hs $wfoo_r1Ai $wfoo_r1Ai = \ ww_s1q5 ww1_s1q9 w_s1q2 -> letrec { $s$wfoldlM'_loop_s1xc $s$wfoldlM'_loop_s1xc = \ sc_s1x7 sc1_s1x5 sc2_s1x6 sc3_s1x4 -> case tagToEnum# (># sc2_s1x6 0#) of _ { False -> (# sc_s1x7, I# sc3_s1x4 #); True -> $s$wfoldlM'_loop_s1xc sc_s1x7 (+# sc1_s1x5 ww1_s1q9) (-# sc2_s1x6 1#) (+# sc3_s1x4 sc1_s1x5) }; } in letrec { $s$wfoldlM'_loop1_s1x3 $s$wfoldlM'_loop1_s1x3 = \ sc_s1x2 sc1_s1x0 sc2_s1x1 sc3_s1wZ -> case tagToEnum# (># sc2_s1x1 0#) of _ { False -> $s$wfoldlM'_loop_s1xc sc_s1x2 ww_s1q5 3# sc3_s1wZ; True -> $s$wfoldlM'_loop1_s1x3 sc_s1x2 (+# sc1_s1x0 ww1_s1q9) (-# sc2_s1x1 1#) (+# sc3_s1wZ sc1_s1x0) }; } in $s$wfoldlM'_loop1_s1x3 w_s1q2 ww_s1q5 2# 0# }}} Now the same with ghc-8.2-rc1. Here, [https://github.com/haskell/vector/blob/master/Data/Vector/Fusion/Stream/Mona... Stream.++] function is not fully optimized away (Left and Right constructors!). Instead we have a join point that executes either of the two parts (xs or ys) based on a `case w2_s1U2 of {Left -> ; Right ->}`. {{{#!hs $wfoo_r23R $wfoo_r23R = \ ww_s1Ue ww1_s1Ui w_s1Ub -> let { x1_a1tj x1_a1tj = I# ww_s1Ue } in let { tb_a1wC tb_a1wC = (x1_a1tj, lvl1_r23Q) } in let { lvl2_s1Yh lvl2_s1Yh = Right tb_a1wC } in joinrec { $wfoldlM'_loop_s1U8 $wfoldlM'_loop_s1U8 w1_s1U0 ww2_s1U6 w2_s1U2 w3_s1U3 = case w1_s1U0 of { __DEFAULT -> case w2_s1U2 of { Left sa_a1yP -> case sa_a1yP of { (w4_a1zr, m1_a1zs) -> case m1_a1zs of { I# x2_a1zw -> case tagToEnum# (># x2_a1zw 0#) of { False -> jump $wfoldlM'_loop_s1U8 SPEC ww2_s1U6 lvl2_s1Yh w3_s1U3; True -> case w4_a1zr of { I# y_a1xT -> jump $wfoldlM'_loop_s1U8 SPEC (+# ww2_s1U6 y_a1xT) (Left (I# (+# y_a1xT ww1_s1Ui), I# (-# x2_a1zw 1#))) w3_s1U3 } } } }; Right sb_a1z3 -> case sb_a1z3 of { (w4_a1zr, m1_a1zs) -> case m1_a1zs of { I# x2_a1zw -> case tagToEnum# (># x2_a1zw 0#) of { False -> (# w3_s1U3, I# ww2_s1U6 #); True -> case w4_a1zr of { I# y_a1xT -> jump $wfoldlM'_loop_s1U8 SPEC (+# ww2_s1U6 y_a1xT) (Right (I# (+# y_a1xT ww1_s1Ui), I# (-# x2_a1zw 1#))) w3_s1U3 } } } } } }; } in jump $wfoldlM'_loop_s1U8 SPEC 0# (Left (x1_a1tj, lvl_r23P)) w_s1Ub }}} For my stream-fusion heavy code, this yields a slowdown of approximately x4 (10 seconds with ghc-8.2-rc1, 2.5 seconds with ghc-8.0.1). -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13623#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler