[GHC] #9688: Improve the interaction between CSE and the join point transformation

#9688: Improve the interaction between CSE and the join point transformation -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Keywords: CSE | Operating System: Architecture: Unknown/Multiple | Unknown/Multiple Difficulty: Unknown | Type of failure: Runtime Blocked By: | performance bug Related Tickets: | Test Case: | Blocking: | Differential Revisions: -------------------------------------+------------------------------------- It appears that the join point transformation sometimes interferes with CSE when CSE would be much better. Two examples: === digitToIntMaybe === Suppose we define {{{#!hs isHexDigit :: Char -> Bool isHexDigit c = (fromIntegral (ord c - ord '0')::Word) <= 9 || (fromIntegral (ord c - ord 'a')::Word) <= 5 || (fromIntegral (ord c - ord 'A')::Word) <= 5 digitToInt c | (fromIntegral dec::Word) <= 9 = dec | (fromIntegral hexl::Word) <= 5 = hexl + 10 | (fromIntegral hexu::Word) <= 5 = hexu + 10 | otherwise = error ("Char.digitToInt: not a digit " ++ show c) -- sigh where dec = ord c - ord '0' hexl = ord c - ord 'a' hexu = ord c - ord 'A' -- We could also expand this out in cases manually, but it makes no -- difference as far as I can tell. }}} Suppose we then write a naive `digitToIntMaybe` function: {{{#!hs digitToIntMaybe c | isHexDigit c = Just (digitToInt c) | otherwise = Nothing }}} What I would want this to do is "zip" the nested cases and give Core like this: {{{ $wdigitToIntMaybe $wdigitToIntMaybe = \ ww_s2Ag -> let { x#_a2yy x#_a2yy = -# (ord# ww_s2Ag) 48 } in case tagToEnum# (leWord# (int2Word# x#_a2yy) (__word 9)) of _ { False -> let { x#1_X2z7 x#1_X2z7 = -# (ord# ww_s2Ag) 97 } in case tagToEnum# (leWord# (int2Word# x#1_X2z7) (__word 5)) of _ { False -> let { x#2_X2zh x#2_X2zh = -# (ord# ww_s2Ag) 65 } in case tagToEnum# (leWord# (int2Word# x#2_X2zh) (__word 5)) of _ { False -> Nothing; True -> Just (I# (+# x#2_X2zh 10)) }; True -> Just (I# (+# x#1_X2z7 10)) }; True -> Just (I# x#_a2yy) } digitToIntMaybe digitToIntMaybe = \ w_s2Ad -> case w_s2Ad of _ { C# ww1_s2Ag -> $wdigitToIntMaybe ww1_s2Ag } }}} But instead, the join point transformation triggers, and we get this: {{{ digitToIntMaybe1 digitToIntMaybe1 = \ ww_s2Cp -> error (unpackAppendCString# "Char.digitToInt: not a digit "# ($w$cshowsPrec15 ww_s2Cp ([]))) $wdigitToIntMaybe $wdigitToIntMaybe = \ ww_s2Cp -> let { $j_s2Bc $j_s2Bc = \ _ -> Just (let { a_s2B5 a_s2B5 = int2Word# (-# (ord# ww_s2Cp) 48) } in case tagToEnum# (leWord# a_s2B5 (__word 9)) of _ { False -> let { a1_s2B7 a1_s2B7 = int2Word# (-# (ord# ww_s2Cp) 97) } in case tagToEnum# (leWord# a1_s2B7 (__word 5)) of _ { False -> let { a2_s2B9 a2_s2B9 = int2Word# (-# (ord# ww_s2Cp) 65) } in case tagToEnum# (leWord# a2_s2B9 (__word 5)) of _ { False -> digitToIntMaybe1 ww_s2Cp; True -> I# (+# (word2Int# a2_s2B9) 10) }; True -> I# (+# (word2Int# a1_s2B7) 10) }; True -> I# (word2Int# a_s2B5) }) } in case tagToEnum# (leWord# (int2Word# (-# (ord# ww_s2Cp) 48)) (__word 9)) of _ { False -> case tagToEnum# (leWord# (int2Word# (-# (ord# ww_s2Cp) 97)) (__word 5)) of _ { False -> case tagToEnum# (leWord# (int2Word# (-# (ord# ww_s2Cp) 65)) (__word 5)) of _ { False -> Nothing; True -> $j_s2Bc void# }; True -> $j_s2Bc void# }; True -> $j_s2Bc void# } digitToIntMaybe digitToIntMaybe = \ w_s2Cm -> case w_s2Cm of _ { C# ww1_s2Cp -> $wdigitToIntMaybe ww1_s2Cp } }}} We perform the same three tests twice each, and test for an error condition that obviously can't happen. === `quotRem` and `divMod` === If we define {{{#!hs x `quot` y = fst (x `quotRem` y) x `rem` y = snd (x `quotRem` y) }}} and then write something like {{{#!hs f x y | x `rem` y == 0 = x `quot` y | otherwise = 17 }}} then CSE works some magic and we only calculate `quotRem x y` once. Unfortunately, if we do this: {{{#!hs whatever x y = if x `myRem` y == 0 then (x `myQuot` y) + 14 else x `myQuot` y }}} then the join point transformation fires, collecting the `myQuot x y` expressions in the case branches and preventing CSE from recognizing the much better opportunity to eliminate those calculations altogether. The situation with `divMod` is much worse. The join point transformation applied to the cases defining `divMod` prevents CSE from working magic on it in even simple situations, unless one of the arguments is known, making this definition unusable (the resulting Core is too horrifyingly long to paste here). It would probably be possible to improve the `divMod` situation to something close to the `quotRem` one by making `divMod` `NOINLINE` and adding special `divModLit` rules, but I'd much rather see a general solution. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9688 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9688: Improve the interaction between CSE and the join point transformation -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: feature | Status: new request | Milestone: Priority: normal | Version: 7.9 Component: Compiler | Keywords: CSE Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by simonpj): Indeed! It looks like a classic code-specialisation question to me. In `digitToIntMaybe` you use some code `isHexDigit` that just happens to use the same tests as the RHS, `digitToInt`. Moreover there are three ways for `isHexDigit` to succeed. If you duplicate the RHS into those three outcomes you'll get the result you want; this amounts to inlining `$j_s2BC`. But that entails code duplication -- and might gain nothing whatsoever. Or, to put it another way, if the user wrote the above `Core`, complete with the local function `$j_s2BC`, would you expect it to be optimised? It would be cool if so. But I don't yet see how to achieve that at reasonable cost. To put it another way, if you wrote this in C: {{{ if blah then x = e1 else x = e2 if similar-blah then s1 else s2 }}} you'd usually expect to test `blah`, assign to `x`, and then test `similar-blah` and do `s1` or `s2`. But it might be better to duplicate (i.e. specialise) the first `if` into the branches: {{{ if similar-blah then { if blah then x=e1 else x=e2 s1 } else { if blah then x=e1 else x=e2 s2 } }}} and now the inner `if` might be optimised away entirely. These are not simple choices. By representing them as simple inlining choices ("shall I inline `$j_s2Bc`?") GHC piggy backs on a lot of careful heuristics for inlining. But, I agree, it does not always work well. I have taken time to explain here, in the hope that someone can do better! Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9688#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9688: Improve the interaction between CSE and the join point transformation -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: feature | Status: new request | Milestone: Priority: normal | Version: 7.9 Component: Compiler | Keywords: CSE Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:1 simonpj]:
Or, to put it another way, if the user wrote the above `Core`, complete with the local function `$j_s2BC`, would you expect it to be optimised?
Probably so, but I don't really know. I think the basic concept is that we should let more information flow from the analysis of the main body into the analysis of `j`. I think these two steps would probably be a good start, although I imagine they could be refined. Of course, you may well know of seven fatal flaws in this approach. 1. The simple one: If the value of a certain expression is always available when calling `j`, check if that expression is used in `j`; if so, pass it in. This is really just CSE, broadened to cross the boundary between the local function and the main body. 2. The less simple one: If the scrutinee of a `case` in `j` has always been scrutinized before `j` is called, "trim" the branches to only the ones that are still possible, and then consider splitting `j` into its remaining branches. In the examples described above, one relevant `case` will be the outermost one in `j`, and that should pretty much always be merged upwards. Doing the same thing a few times will take care of the rest. I think this basically does what would have happened had `j` been inlined, but we actually ''know'' that we're getting something out of it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9688#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9688: Improve the interaction between CSE and the join point transformation -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: CSE, | JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: CSE => CSE, JoinPoints -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9688#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9688: Improve the interaction between CSE and the join point transformation -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.9 Resolution: | Keywords: CSE, | JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by AndreasK): * cc: AndreasK (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9688#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC