[GHC] #13966: Missed optimization - loop fusion

#13966: Missed optimization - loop fusion -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: (none) Type: feature | Status: new request | Priority: low | Milestone: Component: Compiler | Version: 8.2.1-rc3 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- I think GHC should be able to optimize func1 into func2, but currently it doesn't. {{{#!hs data Step s a = Done | Yield s a func1 high = loop1 0# 1# where loop1 acc i = case loop2 i of Done -> acc Yield (I# i') (I# x) -> loop1 (acc +# x) i' loop2 i = case tagToEnum# (i ># high) :: Bool of False -> case remInt# i 2# of 0# -> Yield (I# (i +# 1#)) (I# i) _ -> loop2 (i +# 1#) True -> Done func2 high = loop1 0# 1# where loop1 acc i = case tagToEnum# (i ># high) :: Bool of False -> case remInt# i 2# of 0# -> loop1 (acc +# i) (i +# 1#) _ -> loop1 acc (i +# 1#) True -> acc }}} When using the LLVM backend func2 is almost 4x faster than func1. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by jmspiewak): * priority: low => normal * keywords: => JoinPoints Old description:
I think GHC should be able to optimize func1 into func2, but currently it doesn't.
{{{#!hs data Step s a = Done | Yield s a
func1 high = loop1 0# 1# where loop1 acc i = case loop2 i of Done -> acc Yield (I# i') (I# x) -> loop1 (acc +# x) i'
loop2 i = case tagToEnum# (i ># high) :: Bool of False -> case remInt# i 2# of 0# -> Yield (I# (i +# 1#)) (I# i) _ -> loop2 (i +# 1#) True -> Done
func2 high = loop1 0# 1# where loop1 acc i = case tagToEnum# (i ># high) :: Bool of False -> case remInt# i 2# of 0# -> loop1 (acc +# i) (i +# 1#) _ -> loop1 acc (i +# 1#) True -> acc }}}
When using the LLVM backend func2 is almost 4x faster than func1.
New description:
A simple stream chain
{{{#!hs
chain :: Int -> Int
chain = sum . filter even . enumFromTo 1
}}}
doesn't fuse under a Skip-less stream on GHC 8.2-rc3 -O2.
Benchmarked against a Skip stream (LLVM backend):
{{{
benchmarking Skip-less
time 248.9 ms (243.3 ms .. 257.3 ms)
0.998 R² (0.995 R² .. 0.999 R²)
mean 250.9 ms (248.1 ms .. 254.7 ms)
std dev 5.985 ms (4.831 ms .. 7.311 ms)
benchmarking Skip
time 61.26 ms (60.39 ms .. 62.44 ms)
0.998 R² (0.997 R² .. 0.999 R²)
mean 62.45 ms (61.96 ms .. 62.91 ms)
std dev 1.387 ms (1.190 ms .. 1.669 ms)
}}}
Relevant core (chain1 is Skip-less, chain2 has Skip):
{{{
-- RHS size: {terms: 51, types: 27, coercions: 0, joins: 1/2}
Main.$wchain1 [InlPrag=NOINLINE] :: Int# -> Int#
[GblId, Arity=1, Caf=NoCafRefs, Str=]
Main.$wchain1
= \ (ww_s9ep :: Int#) ->
letrec {
$wloop_s9ea [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Step1 Int
Int
[LclId, Arity=1, Str=, Unf=OtherCon []]
$wloop_s9ea
= \ (ww1_s9e8 :: Int#) ->
case tagToEnum# @ Bool (># ww1_s9e8 ww_s9ep) of {
False ->
case remInt# ww1_s9e8 2# of {
__DEFAULT -> $wloop_s9ea (+# ww1_s9e8 1#);
0# ->
Main.Yield1
@ Int @ Int (GHC.Types.I# (+# ww1_s9e8 1#))
(GHC.Types.I# ww1_s9e8)
};
True -> Main.Done1 @ Int @ Int
}; } in
joinrec {
$wloop1_s9el [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# ->
Int#
[LclId[JoinId(2)], Arity=2, Str=, Unf=OtherCon []]
$wloop1_s9el (ww1_s9ef :: Int#) (ww2_s9ej :: Int#)
= case $wloop_s9ea ww2_s9ej of {
Done1 -> ww1_s9ef;
Yield1 s'_a497 x_a498 ->
case x_a498 of { GHC.Types.I# y_a66i ->
case s'_a497 of { GHC.Types.I# ww4_X9hA ->
jump $wloop1_s9el (+# ww1_s9ef y_a66i) ww4_X9hA
}
}
}; } in
jump $wloop1_s9el 0# 1#
-- RHS size: {terms: 33, types: 9, coercions: 0, joins: 1/1}
Main.$wchain2 [InlPrag=NOINLINE] :: Int# -> Int#
[GblId, Arity=1, Caf=NoCafRefs, Str=]
Main.$wchain2
= \ (ww_s9dZ :: Int#) ->
joinrec {
$wloop_s9dV [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int#
[LclId[JoinId(2)], Arity=2, Str=, Unf=OtherCon []]
$wloop_s9dV (ww1_s9dP :: Int#) (ww2_s9dT :: Int#)
= case tagToEnum# @ Bool (># ww2_s9dT ww_s9dZ) of {
False ->
case remInt# ww2_s9dT 2# of {
__DEFAULT -> jump $wloop_s9dV ww1_s9dP (+# ww2_s9dT 1#);
0# -> jump $wloop_s9dV (+# ww1_s9dP ww2_s9dT) (+#
ww2_s9dT 1#)
};
True -> ww1_s9dP
}; } in
jump $wloop_s9dV 0# 1#
}}}
The code was adapted from M. Snoyman's blog post "Iterators and Streams in
Rust and Haskell".
--
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:1
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

#13966: Skip-less stream fusion -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by jmspiewak): * Attachment "Main.hs" added. Criterion benchmark -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion
-------------------------------------+-------------------------------------
Reporter: jmspiewak | Owner: (none)
Type: feature request | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.2.1-rc3
Resolution: | Keywords: JoinPoints
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by jmspiewak):
A typeclass-based Skip-less stream (also from the blog post) does fuse.
{{{#!hs
data Step3 s = Done3 | Yield3 s (Item3 s)
class Stream3 s where
type Item3 s
next3 :: s -> Step3 s
data EnumFromTo3 a = EnumFromTo3 a a
instance (Ord a, Num a) => Stream3 (EnumFromTo3 a) where
type Item3 (EnumFromTo3 a) = a
next3 (EnumFromTo3 i high)
| i > high = Done3
| otherwise = Yield3 (EnumFromTo3 (i + 1) high) i
data Filter3 a s = Filter3 (a -> Bool) s
instance (Stream3 s, Item3 s ~ a) => Stream3 (Filter3 a s) where
type Item3 (Filter3 a s) = a
next3 (Filter3 predicate s0) = loop s0 where
loop s1 = case next3 s1 of
Done3 -> Done3
Yield3 s2 x
| predicate x -> Yield3 (Filter3 predicate s2) x
| otherwise -> loop s2
sum3 :: (Num (Item3 s), Stream3 s) => s -> Item3 s
sum3 = loop 0 where
loop total s1 = case next3 s1 of
Done3 -> total
Yield3 s2 x -> loop (total + x) s2
{-# NOINLINE chain3 #-}
chain3 :: Int -> Int
chain3 = sum3 . Filter3 even . EnumFromTo3 1
}}}
Adding an existential wrapper doesn't break the fusion.
{{{#!hs
data Stream4 a = forall s. (Stream3 s, Item3 s ~ a) => Stream4 s
enumFromTo4 :: (Ord a, Num a) => a -> a -> Stream4 a
enumFromTo4 start high = Stream4 (EnumFromTo3 start high)
filter4 :: (a -> Bool) -> Stream4 a -> Stream4 a
filter4 p (Stream4 s) = Stream4 (Filter3 p s)
sum4 :: Num a => Stream4 a -> a
sum4 (Stream4 s) = sum3 s
{-# NOINLINE chain4 #-}
chain4 :: Int -> Int
chain4 = sum4 . filter4 even . enumFromTo4 1
}}}
{{{
benchmarking typeclass Skip-less
time 73.11 ms (72.50 ms .. 73.94 ms)
0.999 R² (0.998 R² .. 1.000 R²)
mean 69.80 ms (68.86 ms .. 70.72 ms)
std dev 2.916 ms (2.483 ms .. 3.577 ms)
variance introduced by outliers: 20% (moderately inflated)
benchmarking typeclass existential Skip-less
time 75.44 ms (74.91 ms .. 76.13 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 75.44 ms (75.06 ms .. 75.80 ms)
std dev 1.118 ms (904.6 μs .. 1.479 ms)
}}}
Both result in:
{{{
Rec {
-- RHS size: {terms: 36, types: 11, coercions: 0, joins: 1/1}
Main.main_$s$wloop1 [Occ=LoopBreaker]
:: Int# -> Int# -> Int# -> Int#
[GblId, Arity=3, Caf=NoCafRefs, Str=]
Main.main_$s$wloop1
= \ (sc_s9HL :: Int#) (sc1_s9HK :: Int#) (sc2_s9HJ :: Int#) ->
joinrec {
$wloop2_s9y9 [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# ->
Int#
[LclId[JoinId(2)], Arity=2, Str=, Unf=OtherCon []]
$wloop2_s9y9 (ww_s9y1 :: Int#) (ww1_s9y6 :: Int#)
= case tagToEnum# @ Bool (># ww_s9y1 ww1_s9y6) of {
False ->
case remInt# ww_s9y1 2# of {
__DEFAULT -> jump $wloop2_s9y9 (+# ww_s9y1 1#) ww1_s9y6;
0# ->
Main.main_$s$wloop1 ww1_s9y6 (+# ww_s9y1 1#) (+#
sc2_s9HJ ww_s9y1)
};
True -> sc2_s9HJ
}; } in
jump $wloop2_s9y9 sc1_s9HK sc_s9HL
end Rec }
}}}
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:2
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

#13966: Skip-less stream fusion -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by jmspiewak): * Attachment "Main.2.hs" added. Added typeclass-based stream -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I am rather confused. What concretely is this ticket asking? Are you saying that ghc should reduce the skip+less program to the skip-ful core? Are you proposing stream fusion as an alternative to the foldr/build fusion which currently underlies GHC's list implementation? Something else entirely? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by jmspiewak): Sorry, after reading the paper on join points I thought GHC would be able to fuse the Skip-less stream. The fact that it does fuse the typeclass version convinced me it's possible. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): Here's how I am understanding this post as I have also been thinking about this recently. 1. In the Join Points PLDI paper there is an example in section 5 where join points allows `find` defined in terms of `any` to fuse. The author wants(/expects(?)) this technique to also fuse together normal functional pipelines as well as this simple examples. 2. The introduction of type classes seems irrelevant to the first point. GHC has quite a few optimisations which it uses to eliminate type classes and you can play lots of tricks around this to cause fusion like this to happen. If you look at the code is structured, in `sum` where the specific instance of `next3` is resolved to `Filter3 Int (EnumFromTo Int)` which means that all the consumers nicely line up with each other and hence fuse. The control flow of this program is very different to the first one. 3. A program with a polymorphic return type parametrised by a type class is very similar to a program written in CPS. To see this similarity, when writing a program in CPS in order for execution to continue you must provide a function which says what to do next. When using type classes, you must instead provide the *type* of the result which in then is elaborated to a function or dictionary of functions which explain how to proceed. Thus structuring your program in this way usually allows this kind of fusion to happen. 4. Ah-ha! but purpose of join points is to allow the compiler to optimise code written in direct style as well as code written in continuation passing style so can we do better in this case? The answer to which I don't yet know and I think is an open research problem. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by jmspiewak): Thank you for your detailed comment. The paper specifically mentions fusing Skip-less filter so indeed I was expecting this example to work. I brought up the typeclass approach because it's very inefficient on 8.0 but on 8.2 it's close to hand written recursion. It seems recursive join points can be used to fuse this stream, but only with the help of some typeclass-specific optimization. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion
-------------------------------------+-------------------------------------
Reporter: jmspiewak | Owner: (none)
Type: feature request | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.2.1-rc3
Resolution: | Keywords: JoinPoints
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by jmspiewak):
I conducted more tests - it's not about the typeclass. Somehow putting the
filter predicate in stream state allows fusion to happen. Is it changing
the order of inlining? Prevents some other optimization which was
clobbering the fusion opportunity?
{{{#!hs
filter5 :: (a -> Bool) -> Stream1 a -> Stream1 a
filter5 predicate (Stream1 s0 next) = Stream1 (Filter3 predicate s0) f
where
f (Filter3 p s1) = loop s1 where
loop s = case next s of
Done1 -> Done1
Yield1 s' x
| p x -> Yield1 (Filter3 p s') x
| otherwise -> loop s'
{-# NOINLINE chain5 #-}
chain5 :: Int -> Int
chain5 = sum1 . filter5 even . enumFromTo1 1
}}}
Compiles to:
{{{
-- RHS size: {terms: 36, types: 12, coercions: 0, joins: 2/2}
Main.$wchain5 [InlPrag=NOINLINE] :: Int# -> Int#
[GblId, Arity=1, Caf=NoCafRefs, Str=]
Main.$wchain5
= \ (ww_s9EU :: Int#) ->
joinrec {
$s$wloop_s9QI [Occ=LoopBreaker] :: Int# -> Int# -> Int#
[LclId[JoinId(2)], Arity=2, Str=, Unf=OtherCon []]
$s$wloop_s9QI (sc_s9QH :: Int#) (sc1_s9QG :: Int#)
= joinrec {
$wloop2_s9EA [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int#
[LclId[JoinId(1)], Arity=1, Str=, Unf=OtherCon []]
$wloop2_s9EA (ww1_s9Ey :: Int#)
= case tagToEnum# @ Bool (># ww1_s9Ey ww_s9EU) of {
False ->
case remInt# ww1_s9Ey 2# of {
__DEFAULT -> jump $wloop2_s9EA (+# ww1_s9Ey 1#);
0# -> jump $s$wloop_s9QI (+# ww1_s9Ey 1#) (+#
sc1_s9QG ww1_s9Ey)
};
True -> sc1_s9QG
}; } in
jump $wloop2_s9EA sc_s9QH; } in
jump $s$wloop_s9QI 1# 0#
}}}
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:7
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

#13966: Skip-less stream fusion
-------------------------------------+-------------------------------------
Reporter: jmspiewak | Owner: (none)
Type: feature request | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.2.1-rc3
Resolution: | Keywords: JoinPoints
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by simonpj):
This is a very interesting example, thank you.
I note that in HEAD it fuses just fine. I have not yet figured out
exactly why, but I want to look at this code, from teh Description:
{{{
letrec {
$wloop [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Step1 Int Int
[LclId, Arity=1, Str=, Unf=OtherCon []]
$wloop
= \ (ww1_s9e8 :: Int#) ->
case tagToEnum# @ Bool (># ww1_s9e8 ww_s9ep) of {
False ->
case remInt# ww1_s9e8 2# of {
__DEFAULT -> $wloop (+# ww1_s9e8 1#);
0# ->
Main.Yield1
@ Int @ Int (GHC.Types.I# (+# ww1_s9e8 1#))
(GHC.Types.I# ww1_s9e8)
};
True -> Main.Done1 @ Int @ Int
}; } in
joinrec {
$wloop1 [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int#
[LclId[JoinId(2)], Arity=2, Str=, Unf=OtherCon []]
$wloop1 (ww1_s9ef :: Int#) (ww2_s9ej :: Int#)
= case $wloop ww2_s9ej of {
Done1 -> ww1_s9ef;
Yield1 s'_a497 x_a498 ->
case x_a498 of { GHC.Types.I# y_a66i ->
case s'_a497 of { GHC.Types.I# ww4_X9hA ->
jump $wloop1 (+# ww1_s9ef y_a66i) ww4_X9hA
}
}
}; } in
jump $wloop1 0# 1#
}}}
Once GHC gets the program into this state, it's not going to be able to
optimise it. HEAD somehow avoids this dead end, but I hate things where
GHC can get stuck in a dead end. I think this code ''ought'' to optimise
just fine. Here's why.
Look at thar functionn `$wloop`. It's not a join point becuaes it's not
tail-called in the body of the `letrec`. But suppose we transform
`$wloop` like this:
{{{
let {
$wloop [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Step1 Int Int
[LclId, Arity=1, Str=, Unf=OtherCon []]
$wloop
= \x. joinrec $j (ww1_s9e8 :: Int#)
= case tagToEnum# @ Bool (># ww1_s9e8 ww_s9ep) of {
False ->
case remInt# ww1_s9e8 2# of {
__DEFAULT -> jump $j (+# ww1_s9e8 1#);
0# ->
Main.Yield1
@ Int @ Int (GHC.Types.I# (+# ww1_s9e8 1#))
(GHC.Types.I# ww1_s9e8)
};
True -> Main.Done1 @ Int @ Int
in jump $j x
}}}
Now `$wloop` is non-recursive, so we can inline it at its only call site,
in `$wloop1`:
{{{
joinrec {
$wloop1 [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int#
[LclId[JoinId(2)], Arity=2, Str=, Unf=OtherCon []]
$wloop1 (ww1_s9ef :: Int#) (ww2_s9ej :: Int#)
= case (...the body of $wloop...) of {
Done1 -> ww1_s9ef;
Yield1 s'_a497 x_a498 ->
case x_a498 of { GHC.Types.I# y_a66i ->
case s'_a497 of { GHC.Types.I# ww4_X9hA ->
jump $wloop1 (+# ww1_s9ef y_a66i) ww4_X9hA
}
}
}; } in
jump $wloop1 0# 1#
}}}
And now the right fusion will happen.
The crucial bit was the transformation of `$wloop`: we took a tail-
recursive function, introduced a (recursive) join point into it, which
made it non-recursive. Even if nothing further happens, the
implementation of `$wloop` is a bit more efcient because the tail call is
just a branch. But the big thing here is that we can now inline `$wloop`
in `$wloop1`.
So the idea is this: a transformation to turn a tail-recursive function
definition into one that is implemented with a recursive join point. If
we had such a transformation, it'd get us out of the dead end.
Actually, there's a variant of the Static Argument Transformation
([wiki:StaticArgumentTransformation]) at work here. Consider
{{{
f x y = case y of
A -> f x y'
B -> e2
C -> e3
}}}
Here the first argument to `f` is "static"; that is, the same in every
call. So we can transform like this
{{{
f x y = joinrec $j y = case y of
A -> $j y'
B -> e2
C -> e3
in $j y
}}}
Note that `x` isn't passed around in every iteration.
There's a GHC module `SAT.hs` which does the static argument
transformation, but it is not join-point aware. So we should fix that.
One reason we don't currently do SAT all the time is that the results are
a bit ambiguous; see Andre Santos's thesis for more, cited on
[wiki:StaticArgumentTransformation]. BUT I think that some (maybe most)
of the problems with SAT may go away if we restrict SAT to tail-recursive
functions that we can turn into `joinrecs`.
Any volunteers?
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:8
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

#13966: Skip-less stream fusion -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: mpickering Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by mpickering): * owner: (none) => mpickering Comment: I can look at this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion: a missed opportunity -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: mpickering Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints, | StaticArgumentTransformation Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: JoinPoints => JoinPoints, StaticArgumentTransformation -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion: a missed opportunity -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: mpickering Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints, | StaticArgumentTransformation Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14067 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nomeata): * related: => #14067 Comment: This ticket is confusing, and about a specific use case. I took the liberty of creating a new ticket for the task of implementing a SAT for tail-recursive functions at #14067. Once that is done, we can revisit if it actually fixed the problem at hand. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion: a missed opportunity -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: mpickering Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints, | StaticArgumentTransformation Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14067 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Hmm, maybe I did that a bit prematurely, without fully understanding what we should be doing. Upon closer reading, there are two tasks: * Transforming a tail-recursive function into a non-recursive function with a `joinrec`. * SAT for tail-recursive functions It seems that the first is independent of the second, and if the first one was done, one can rephrase the second as “SAT for `joinrec`s”. Is that roughly correct? (I am surprised that the former is not done already from the beginning of join points in GHC. But a brief look at the distribution of labor between OccAnal and the Simplifier makes it clear to me that this change is not trivial.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion: a missed opportunity -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: mpickering Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints, | StaticArgumentTransformation Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14067 #14068 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nomeata): * related: #14067 => #14067 #14068 Comment: See #14068 for the first of these tasks. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion: a missed opportunity -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: mpickering Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints, | StaticArgumentTransformation Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14067 #14068 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Yes, two tasks. Of these, the most important is the first #14068. The second SAT part is worthwhile, I think, but this particular ticket doesn't really show why. The point is that while SAT is ''sometimes'' good for normal definitions, I think it's probably ''always'' good (or at least not harmful) for `joinrecs`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion: a missed opportunity
-------------------------------------+-------------------------------------
Reporter: jmspiewak | Owner: mpickering
Type: feature request | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.2.1-rc3
Resolution: | Keywords: JoinPoints,
| StaticArgumentTransformation
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: #14067 #14068 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by mpickering):
Implementing this leads to:
{{{
benchmarking Skip-less
time 632.2 ms (532.7 ms .. 835.8 ms)
0.988 R² (0.976 R² .. 1.000 R²)
mean 1.187 s (1.044 s .. 1.308 s)
std dev 191.0 ms (0.0 s .. 207.9 ms)
variance introduced by outliers: 46% (moderately inflated)
benchmarking Skip
time 904.3 ms (904.1 ms .. 904.8 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.230 s (1.130 s .. 1.306 s)
std dev 114.9 ms (0.0 s .. 130.3 ms)
variance introduced by outliers: 22% (moderately inflated)
}}}
And the core looks quite similar:
{{{
chain1
= \ (w_s9yo :: Int) ->
case w_s9yo of { GHC.Types.I# ww1_s9yr ->
joinrec {
$wsat_worker2_s9yn [InlPrag=NOUSERINLINE[0], Occ=LoopBreaker]
:: Int# -> Int# -> Int
[LclId[JoinId(2)], Arity=2, Str=m, Unf=OtherCon []]
$wsat_worker2_s9yn (ww2_s9yh :: Int#) (ww3_s9yl :: Int#)
= join {
lvl_s9BD [Dmd=m, Unf=OtherCon []]
$wsat_worker3_s9yc (ww4_s9ya :: Int#)
= case ># ww4_s9ya ww1_s9yr of {
__DEFAULT ->
case remInt# ww4_s9ya 2# of {
__DEFAULT -> jump $wsat_worker3_s9yc (+# ww4_s9ya
1#);
0# ->
jump $wsat_worker2_s9yn (+# ww2_s9yh ww4_s9ya)
(+# ww4_s9ya 1#)
};
1# -> jump lvl_s9BD
}; } in
jump $wsat_worker3_s9yc ww3_s9yl; } in
jump $wsat_worker2_s9yn 0# 1#
}
}}}
{{{
chain2
= \ (w_s9xY :: Int) ->
case w_s9xY of { GHC.Types.I# ww1_s9y1 ->
joinrec {
$wsat_worker2_s9xX [InlPrag=NOUSERINLINE[0], Occ=LoopBreaker]
:: Int# -> Int# -> Int
[LclId[JoinId(2)], Arity=2, Str=m, Unf=OtherCon []]
$wsat_worker2_s9xX (ww2_s9xR :: Int#) (ww3_s9xV :: Int#)
= case ># ww3_s9xV ww1_s9y1 of {
__DEFAULT ->
case remInt# ww3_s9xV 2# of {
__DEFAULT -> jump $wsat_worker2_s9xX ww2_s9xR (+#
ww3_s9xV 1#);
0# ->
jump $wsat_worker2_s9xX (+# ww2_s9xR ww3_s9xV) (+#
ww3_s9xV 1#)
};
1# -> GHC.Types.I# ww2_s9xR
}; } in
jump $wsat_worker2_s9xX 0# 1#
}
}}}
Still need to investigate the general impact more.
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:15
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

#13966: Skip-less stream fusion: a missed opportunity -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: mpickering Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints, | StaticArgumentTransformation Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14067 #14068 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I'm getting lost.
Implementing this leads to
* What exactly is "this"? * Does `chain1` have "this" implemented? Or `chain2`? * Which do you think is most desirable, and why? (`chain2` looks simpler.) * Are you assuming loopification #14068 is done? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion: a missed opportunity -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: mpickering Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints, | StaticArgumentTransformation Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14067 #14068 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): Here is the exact file I am using. {{{ {-# LANGUAGE ExistentialQuantification #-} module Main where import GHC.Prim import Criterion.Main import GHC.Prim main :: IO () main = defaultMain [b1, b2] where b1 = bench "Skip-less" $ whnf chain1 x b2 = bench "Skip" $ whnf chain2 x x = 100000000 -------------------------------------------------------------------------------- data Step1 s a = Done1 | Yield1 s a data Stream1 a = forall s. Stream1 s (s -> Step1 s a) enumFromTo1 :: (Ord a, Num a) => a -> a -> Stream1 a enumFromTo1 start high = Stream1 start f where f i | i > high = Done1 | otherwise = Yield1 (i + 1) i filter1 :: (a -> Bool) -> Stream1 a -> Stream1 a filter1 predicate (Stream1 s0 next) = Stream1 s0 loop where loop s = case next s of Done1 -> Done1 Yield1 s' x | predicate x -> Yield1 s' x | otherwise -> loop s' sum1 :: Num a => Stream1 a -> a sum1 (Stream1 s0 next) = loop 0 s0 where loop total s = case next s of Done1 -> total Yield1 s' x -> loop (total + x) s' chain1 :: Int -> Int chain1 = sum1 . filter1 even . enumFromTo1 1 -------------------------------------------------------------------------------- data Step2 s a = Done2 | Skip2 s | Yield2 s a data Stream2 a = forall s. Stream2 s (s -> Step2 s a) enumFromTo2 :: (Ord a, Num a) => a -> a -> Stream2 a enumFromTo2 start high = Stream2 start f where f i | i > high = Done2 | otherwise = Yield2 (i + 1) i filter2 :: (a -> Bool) -> Stream2 a -> Stream2 a filter2 predicate (Stream2 s0 next) = Stream2 s0 loop where loop s = case next s of Done2 -> Done2 Skip2 s' -> Skip2 s' Yield2 s' x | predicate x -> Yield2 s' x | otherwise -> Skip2 s' sum2 :: Num a => Stream2 a -> a sum2 (Stream2 s0 next) = loop 0 s0 where loop total s = case next s of Done2 -> total Skip2 s' -> loop total s' Yield2 s' x -> loop (total + x) s' chain2 :: Int -> Int chain2 = sum2 . filter2 even . enumFromTo2 1 }}} I modified the SAT pass to ignore information about static arguments, perform the SAT transformation and then check whether we created a join point. If we create a join point then we keep the transformed version, otherwise we leave the code as it was. (This is what you suggested in comment:8) I then compiled the above program with this transformation turned on. `chain2` was unaffected, the core is as before but the core for `chain1` changed quite a bit. It seems from running the benchmarks that `chain1` is better but I didn't look yet why this might be the case. I am building from a recent HEAD (11d9615e9f751d6ed084f1cb20c24ad6b408230e) so whether loopification is in there or not I don't know. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion: a missed opportunity -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: mpickering Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints, | StaticArgumentTransformation Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14067 #14068 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): I read #14068 now which I thought the whole point of this ticket was? What is the difference meant to be? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13966: Skip-less stream fusion: a missed opportunity -------------------------------------+------------------------------------- Reporter: jmspiewak | Owner: mpickering Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1-rc3 Resolution: | Keywords: JoinPoints, | StaticArgumentTransformation Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14067 #14068 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): To summarize (excerpting from comment:8), this ticket consists of two parts, * #14067: Transforming a tail-recursive function into a non-recursive function with a joinrec. * #14068: SAT for tail-recursive functions It might be best to keep specific discussion on those tickets where possible. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13966#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC