RE: [Haskell] Expecting more inlining for bit shifting

On Mon, 9 Oct 2006, Simon Peyton-Jones wrote:
Turns out that 'shift' is just too big to be inlined. (It's only called once, but you have exported it too.)
You can see GHC's inlining decisions by saying -ddump-inlinings.
To make GHC keener to inline, use an INLINE pragma, or increase the inlining size threshold e.g. -funfolding-threshold=12
Okay, when I force inlining for shift, (and I even need to do it for shiftR!) then the code is inlined in C. However this isn't the behaviour I want. Ideally the inlining should only happen when/because the second argument of shift is constant and the system knows that it can evaluate the case analysis away and that makes the function small. Am I being too naive on what to expect from my complier or is this reasonable? PS, is there a way to mark an imported instance of a class function (Data.Bits.shift for Data.Word.Word32) to be inlined? -- Russell O'Connor http://r6.ca/ ``All talk about `theft,''' the general counsel of the American Graphophone Company wrote, ``is the merest claptrap, for there exists no property in ideas musical, literary or artistic, except as defined by statute.''

On Mon, Oct 09, 2006 at 10:33:47AM -0400, roconnor@theorem.ca wrote:
On Mon, 9 Oct 2006, Simon Peyton-Jones wrote:
Turns out that 'shift' is just too big to be inlined. (It's only called once, but you have exported it too.)
You can see GHC's inlining decisions by saying -ddump-inlinings.
To make GHC keener to inline, use an INLINE pragma, or increase the inlining size threshold e.g. -funfolding-threshold=12
Okay, when I force inlining for shift, (and I even need to do it for shiftR!) then the code is inlined in C. However this isn't the behaviour I want. Ideally the inlining should only happen when/because the second argument of shift is constant and the system knows that it can evaluate the case analysis away and that makes the function small.
Am I being too naive on what to expect from my complier or is this reasonable?
It might be possible, but it sounds tricky. I guess it would have to go something like "try inlining this, run the simplifier, see if it got small enough, if not back out", which could waste a lot of work if it fails in lots of cases.
PS, is there a way to mark an imported instance of a class function (Data.Bits.shift for Data.Word.Word32) to be inlined?
You can use GHC.Exts.inline in 6.6: http://www.haskell.org/ghc/dist/current/docs/users_guide/special-ids.html#id... but note the restriction in the final paragraph. Thanks Ian

Okay, when I force inlining for shift, (and I even need to do it for shiftR!) then the code is inlined in C. However this isn't the behaviour I want. Ideally the inlining should only happen when/because the second argument of shift is constant and the system knows that it can evaluate the case analysis away and that makes the function small.
Am I being too naive on what to expect from my complier or is this reasonable?
It might be possible, but it sounds tricky. I guess it would have to go something like "try inlining this, run the simplifier, see if it got small enough, if not back out", which could waste a lot of work if it fails in lots of cases.
I would have imagined an optimisation step that only activates when a constructor is passed into a function to see if it produces a branch that can be precomputed, and then tries to determine if it is worth making a specialized function with that case eliminated. Or possibly having each function inspected to see if it has branches that could be eliminated if a constructor was passed as an argument. I must say I'm extremely disappointed with this. I believe I was taught in my undergraduate CS program (but perhaps I wasn't) that one ought not to make these sorts of trivial hand optimisations, because compilers are smart enough to figure out these sorts of things by themselves, and they know more about that target platform that you do. In particular the propaganda about side-effect-free functional languages was a promise that they would use the strong types and side-effect-freeness to do all sorts of wonderful optimisations. However, it seems the truth of the matter is that an advanced compiler such as GHC cannot even optimise away the bounds checks occurring when shifting by a constant number of bits. -- Russell O'Connor http://r6.ca/ ``All talk about `theft,''' the general counsel of the American Graphophone Company wrote, ``is the merest claptrap, for there exists no property in ideas musical, literary or artistic, except as defined by statute.''

| I would have imagined an optimisation step that only activates when a | constructor is passed into a function to see if it produces a branch that | can be precomputed, and then tries to determine if it is worth making a | specialized function with that case eliminated. Or possibly having each | function inspected to see if it has branches that could be eliminated if a | constructor was passed as an argument. That's precisely what GHC does. My explanation before was too brief, sorry. The algorithm is described in "Secrets of the GHC inliner" http://research.microsoft.com/%7Esimonpj/Papers/inlining/index.htm But it still only makes a specialised copy if the function is "small enough". Obviously a great big function with a tiny specialisation opportunity would be a poor candidate. | I must say I'm extremely disappointed with this. I believe I was taught | in my undergraduate CS program (but perhaps I wasn't) that one ought not | to make these sorts of trivial hand optimisations, because compilers are | smart enough to figure out these sorts of things by themselves, and they | know more about that target platform that you do. In particular the | propaganda about side-effect-free functional languages was a promise that | they would use the strong types and side-effect-freeness to do all sorts | of wonderful optimisations. Well I think if you use -ddump-simpl you'll see a program that often looks pretty different to the one you wrote. I often have difficulty figuring out just how GHC managed to transform the source program into the optimised one. Of course it's far from perfect! Fortunately, GHC is an open-source compiler, so the way lies open for anyone, including you, to improve the inlining decisions. I'm sure there are good opportunities there. Simon PS: you mention "knowing about the target platform". That's one thing that GHC is distinctly poor on. It leaves all target-platform-specific optimisations to the C compiler.

On Tue, 10 Oct 2006, Simon Peyton-Jones wrote:
That's precisely what GHC does. My explanation before was too brief, sorry. The algorithm is described in "Secrets of the GHC inliner" http://research.microsoft.com/%7Esimonpj/Papers/inlining/index.htm
But it still only makes a specialised copy if the function is "small enough". Obviously a great big function with a tiny specialisation opportunity would be a poor candidate.
Do you mean if the resulting function if ``small enough'', or the original function is small enough? Anyhow I will look at the paper.
| I must say I'm extremely disappointed with this. I believe I was taught | in my undergraduate CS program (but perhaps I wasn't) that one ought not | to make these sorts of trivial hand optimisations, because compilers are | smart enough to figure out these sorts of things by themselves, and they | know more about that target platform that you do. In particular the | propaganda about side-effect-free functional languages was a promise that | they would use the strong types and side-effect-freeness to do all sorts | of wonderful optimisations.
Well I think if you use -ddump-simpl you'll see a program that often looks pretty different to the one you wrote. I often have difficulty figuring out just how GHC managed to transform the source program into the optimised one.
I should metion that I feel that GHC does do a great optimisation job considering. It manages to transform a language, largely based on abstract mathematics, into something that a Von Neumann can run in practise. This is, of course, and amasing feat. I am just going to have to learn that the comipler won't do optimisations that I think it ought to figure out. So I will either have to start writing hand tunned non-portable, potentially unsafe code, or figure out how to extend GHC optimizer. I prefer the later. -- Russell O'Connor http://r6.ca/ ``All talk about `theft,''' the general counsel of the American Graphophone Company wrote, ``is the merest claptrap, for there exists no property in ideas musical, literary or artistic, except as defined by statute.''

On Mon, Oct 09, 2006 at 03:54:41PM +0100, Ian Lynagh wrote:
It might be possible, but it sounds tricky. I guess it would have to go something like "try inlining this, run the simplifier, see if it got small enough, if not back out", which could waste a lot of work if it fails in lots of cases.
I would think the easiest way to go about this would be to extend the rules pragma. {-# RULES "shift/const-inline" forall x y# . shift x y# = ... #-} where variables ending in # will only match constants known at compile time. or perhaps.. {-# RULES "shift/const-inline" forall x (y::const Int) . shift x y# = ... #-} or something like that. I imagine such a thing would have other uses as well... John -- John Meacham - ⑆repetae.net⑆john⑈

| I would think the easiest way to go about this would be to extend the | rules pragma. | | {-# RULES "shift/const-inline" forall x y# . shift x y# = ... #-} | | where variables ending in # will only match constants known at compile time Interesting idea. GHC can do that *internally* using a "BuiltinRule", and it's internal precisely because there's no obvious way to say "match only a literal". I suppose that you might also want to say "match only a constructor"? To have a rule for 'f' that would fire only when you saw f (Just x) or f Nothing but not f (g y) For that, a # would not really be appropriate. Would this be valuable? If so, think of a nice syntax. It's not trivial to implement, but not hard either. Simon | -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users-bounces@haskell.org] | On Behalf Of John Meacham | Sent: 18 October 2006 06:37 | To: glasgow-haskell-users@haskell.org | Subject: Re: [Haskell] Expecting more inlining for bit shifting | | On Mon, Oct 09, 2006 at 03:54:41PM +0100, Ian Lynagh wrote: | > It might be possible, but it sounds tricky. I guess it would have to go | > something like "try inlining this, run the simplifier, see if it got | > small enough, if not back out", which could waste a lot of work if it | > fails in lots of cases. | | I would think the easiest way to go about this would be to extend the | rules pragma. | | {-# RULES "shift/const-inline" forall x y# . shift x y# = ... #-} | | where variables ending in # will only match constants known at compile | time. or perhaps.. | | {-# RULES "shift/const-inline" forall x (y::const Int) . shift x y# = ... #-} | | or something like that. | | I imagine such a thing would have other uses as well... | | John | | -- | John Meacham - ⑆repetae.net⑆john⑈ | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Wed, Oct 18, 2006 at 09:48:31AM +0100, Simon Peyton-Jones wrote:
| I would think the easiest way to go about this would be to extend the | rules pragma. | | {-# RULES "shift/const-inline" forall x y# . shift x y# = ... #-} | | where variables ending in # will only match constants known at compile time
Interesting idea. GHC can do that *internally* using a "BuiltinRule", and it's internal precisely because there's no obvious way to say "match only a literal".
I suppose that you might also want to say "match only a constructor"? To have a rule for 'f' that would fire only when you saw f (Just x) or f Nothing but not f (g y)
For that, a # would not really be appropriate.
Would this be valuable? If so, think of a nice syntax. It's not trivial to implement, but not hard either.
heh. thinking of a nice syntax is what has kept me from exposing this behavior to the user in jhc so far. :) I am thinking something like the following: {-# RULES "shift/const-inline" forall x y | is_constant y . shift x y = ... #-} which uses '|' like in pattern matching, where it specifies a condition the variable has to meet. so 'is_constant' will say whether y is completely known at compile time. the nice thing about this syntax is that it is extendable pretty easily {-# RULES "foo/known" forall x y | is_whnf y . foo x y = ... #-} where is_whnf will test whether y is bound directly to a constructor or a lambda expression. John -- John Meacham - ⑆repetae.net⑆john⑈

On Tue, 17 Oct 2006, John Meacham wrote:
On Mon, Oct 09, 2006 at 03:54:41PM +0100, Ian Lynagh wrote:
It might be possible, but it sounds tricky. I guess it would have to go something like "try inlining this, run the simplifier, see if it got small enough, if not back out", which could waste a lot of work if it fails in lots of cases.
I would think the easiest way to go about this would be to extend the rules pragma.
{-# RULES "shift/const-inline" forall x y# . shift x y# = ... #-}
I'm not sure this approach is best. In my case the ... needs to be the entire body of the shift code. It would be ridiculous to have two copies of the same code. What would be better is a hint pragma that says, ``inline me if the following set of parameters are literals''. -- Russell O'Connor http://r6.ca/ ``All talk about `theft,''' the general counsel of the American Graphophone Company wrote, ``is the merest claptrap, for there exists no property in ideas musical, literary or artistic, except as defined by statute.''

On Wed, Oct 18, 2006 at 07:00:18AM -0400, roconnor@theorem.ca wrote:
I'm not sure this approach is best. In my case the ... needs to be the entire body of the shift code. It would be ridiculous to have two copies of the same code. What would be better is a hint pragma that says, ``inline me if the following set of parameters are literals''.
not at all:
{-# RULES "shift/const-inline" forall x y# . shift x y# = inline shift x y# #-}
of course, you would still need to make sure the body of shift were available. (perhaps instances of inline in rules should force the argument to appear in the hi file in full) a hint pragma would be trickier to implement and not as flexible I would think. for instance, what if you want to inline only when one argument is an arbitrary constant, but the other arg is a certain value? John -- John Meacham - ⑆repetae.net⑆john⑈

As an experiment, I tried the following modification of my code module Test where import GHC.Word import GHC.Base import GHC.Prim a `shiftRLT` b | b >=# 32# = int2Word# 0# | otherwise = a `uncheckedShiftRL#` b (W32# x#) `shift` (I# i#) = {- we do an actual case analysis on i# to try to give us a discount -} case i# of {- For some bizzare reason removing the `shiftRLT` 0# makes the inlining fail again -} 0# -> W32# (x# `shiftRLT` 0#) _ -> if i# >=# 0# then W32# (narrow32Word# (x# `shiftL#` i#)) else W32# (x# `shiftRLT` negateInt# i#) x `shiftR` y = x `shift` (-y) shift7 x = x `shiftR` 7 ghc -fglasgow-exts --make -O3 Test.hs && ghc --show-iface Test.hi yields: ... 12 shift7 :: GHC.Word.Word32 -> GHC.Word.Word32 {- Arity: 1 HasNoCafRefs Strictness: U(L)m Unfolding: (\ x :: GHC.Word.Word32 -> case @ GHC.Word.Word32 x of w { W32# ww -> GHC.Word.W32# (GHC.Prim.uncheckedShiftRL# ww 7) }) -} ... so the inline is successful. But removing the 0# case yields: ... 14 shift7 :: GHC.Word.Word32 -> GHC.Word.Word32 {- Arity: 1 HasNoCafRefs Strictness: U(L)m Unfolding: (\ x :: GHC.Word.Word32 -> case @ GHC.Word.Word32 x of w { W32# ww -> case @ GHC.Word.Word32 $wshift ww (-7) of ww1 { DEFAULT -> GHC.Word.W32# ww1 } }) -} ... and the inlining doesn't occur. (BTW, this is so much better than reading the generated C code :) So, my hypothesis is that the inliner doesn't recognise that ``if (x >= 0) then ...'' is effectively a case analysis on x, and thus the argument discount is not fired. So we need to figure out how to extend this criterion for when to apply the argument discount. My best guess is that an argument x should be considered scrutinised by a case when there is a case analysis on an expression without any recursive sub-expressions whose only free variable is x. Perhaps there are some better ideas. (This whole idea of argument discounting seems rather ad hoc. Is it not possible try out an inline, and remove it if in the end it doesn't get reduced in size sufficently?) -- Russell O'Connor http://r6.ca/ ``All talk about `theft,''' the general counsel of the American Graphophone Company wrote, ``is the merest claptrap, for there exists no property in ideas musical, literary or artistic, except as defined by statute.''

| So, my hypothesis is that the inliner doesn't recognise that | ``if (x >= 0) then ...'' is effectively a case analysis on x, and thus the | argument discount is not fired. So we need to figure out how to extend | this criterion for when to apply the argument discount. Correct. GHC generates case (x# >=# 0#) of { True -> ...; False -> ... } But the argument discount only applies when we have case y of { ... } So you really want a discount for the args of a primop. The relevant file is coreSyn/CoreUnfold.lhs, and the function is calcUnfoldingGuidance. I see some notes there with primops, namely: PrimOpId op -> primOpSize op (valArgCount args) -- foldr addSize (primOpSize op) (map arg_discount args) -- At one time I tried giving an arg-discount if a primop -- is applied to one of the function's arguments, but it's -- not good. At the moment, any unlifted-type arg gets a -- 'True' for 'yes I'm evald', so we collect the discount even -- if we know nothing about it. And just having it in a primop -- doesn't help at all if we don't know something more. At the call site, the call f x y gets f's arg-discount for x if x is evaluated. But in the case of primitive types we don't just want "evaluated", we want to know the value. So one could refine that. The relevant function is interestingArg in simplCore/SimplUtils. | (This whole idea of argument discounting seems rather ad hoc. Is it not | possible try out an inline, and remove it if in the end it doesn't get | reduced in size sufficently?) Yes, you could try that too. It might result in a lot of wasted work, but it'd be a reasonable thing to try. The relevant code is in simplCore/Simplify.lhs Simon

On Wed, 11 Oct 2006, Simon Peyton-Jones wrote:
Correct. GHC generates case (x# >=# 0#) of { True -> ...; False -> ... } But the argument discount only applies when we have case y of { ... }
So you really want a discount for the args of a primop.
Do you think it should be that general? I was thinking the discount should only apply in the situtation where a case expression contains an expression with one free varaible that is a function argument, and all operations are primitive. So I was thinking the right place to patch is in sizeExpr: size_up (Case (Var v) _ _ alts) | v `elem` top_args = ... And make this branch activate is a wider range of circumstances. SamB is/was working on such a patch. But making sure that all operations are primitive is not quite right, for instance in f :: Int -> ... f x | gcd x 21 > 1 = ... we cannot give x an argument discount because (gcd (5::Int) 21) is not rewritten into 1 (for some strange reason). So, is there a way of deciding if a primitive op will be rewritten if all its arguements are given? -- Russell O'Connor http://r6.ca/ ``All talk about `theft,''' the general counsel of the American Graphophone Company wrote, ``is the merest claptrap, for there exists no property in ideas musical, literary or artistic, except as defined by statute.''

| Do you think it should be that general? I was thinking the discount | should only apply in the situtation where a case expression contains an | expression with one free varaible that is a function argument, and all | operations are primitive. Well, if you see x >=# 0 then it'd be good to inline if argument x was bound to a literal, even if the >= is not scrutinised by a case. Why? Because then we can constant-fold it away. But perhaps the discount should be smaller? | So, is there a way of deciding if a primitive op will be rewritten if all | its arguements are given? The constant-folding rules for the primops are all in prelude/PrelRules.lhs in function primOpRules. Please add more rules. For example, I see that x +# 0 = x is not in there! Simon

On Wed, 11 Oct 2006, Simon Peyton-Jones wrote:
The constant-folding rules for the primops are all in prelude/PrelRules.lhs in function primOpRules. Please add more rules. For example, I see that x +# 0 = x is not in there!
It is in libraries/base/GHC/Base.lhs "x# +# 0#" forall x#. x# +# 0# = x# -- Russell O'Connor http://r6.ca/ ``All talk about `theft,''' the general counsel of the American Graphophone Company wrote, ``is the merest claptrap, for there exists no property in ideas musical, literary or artistic, except as defined by statute.''

Hello Simon, Wednesday, October 11, 2006, 2:23:59 PM, you wrote:
The constant-folding rules for the primops are all in prelude/PrelRules.lhs in function primOpRules. Please add more rules. For example, I see that x +# 0 = x is not in there!
but GHC.Base contains {-# RULES "x# +# 0#" forall x#. x# +# 0# = x# "0# +# x#" forall x#. 0# +# x# = x# "x# -# 0#" forall x#. x# -# 0# = x# "x# -# x#" forall x#. x# -# x# = 0# "x# *# 0#" forall x#. x# *# 0# = 0# "0# *# x#" forall x#. 0# *# x# = 0# "x# *# 1#" forall x#. x# *# 1# = x# "1# *# x#" forall x#. 1# *# x# = x# #-} is this not enough? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Simon Peyton-Jones
| So, my hypothesis is that the inliner doesn't recognise that | ``if (x >= 0) then ...'' is effectively a case analysis on x, and thus the | argument discount is not fired. So we need to figure out how to extend | this criterion for when to apply the argument discount.
Correct. GHC generates case (x# >=# 0#) of { True -> ...; False -> ... } But the argument discount only applies when we have case y of { ... }
So you really want a discount for the args of a primop.
The relevant file is coreSyn/CoreUnfold.lhs, and the function is calcUnfoldingGuidance.
Actually it is sizeExpr. (Even so, apparantly I've been figuring this out the hard way...) The brach that currently handles these is the size_up (Case e _ _ alts) = nukeScrutDiscount (size_up e) `addSize` foldr (addSize . size_up_alt) sizeZero alts branch. I've got a patch that seems like it ought to do a bettter job, but it doesn't seem to give the $wrotate functions any discount (the $wshift functions having been tagged by the {-# INLINE shift #-} pragmas I added all over). Unfortunately I left it at home and I'm at school right now :-(. It does get run sometimes, but I'm not sure if it is run for rotate or that its results are kept...
I see some notes there with primops, namely:
PrimOpId op -> primOpSize op (valArgCount args) -- foldr addSize (primOpSize op) (map arg_discount args) -- At one time I tried giving an arg-discount if a primop -- is applied to one of the function's arguments, but it's -- not good. At the moment, any unlifted-type arg gets a -- 'True' for 'yes I'm evald', so we collect the discount even -- if we know nothing about it. And just having it in a primop -- doesn't help at all if we don't know something more.
At the call site, the call f x y gets f's arg-discount for x if x is evaluated. But in the case of primitive types we don't just want "evaluated", we want to know the value. So one could refine that. The relevant function is interestingArg in simplCore/SimplUtils.
I might point out that the current code would throw out those discounts (the nukeSrutDiscounts in that case).
| (This whole idea of argument discounting seems rather ad hoc. Is it not | possible try out an inline, and remove it if in the end it doesn't get | reduced in size sufficently?)
Yes, you could try that too. It might result in a lot of wasted work, but it'd be a reasonable thing to try. The relevant code is in simplCore/Simplify.lhs
Simon

On Wed, 11 Oct 2006, Samuel Bronson wrote:
branch. I've got a patch that seems like it ought to do a bettter job, but it doesn't seem to give the $wrotate functions any discount (the $wshift functions having been tagged by the {-# INLINE shift #-} pragmas I added all over). Unfortunately I left it at home and I'm at school right now :-(. It does get run sometimes, but I'm not sure if it is run for rotate or that its results are kept...
The problem with rotate is that it is: (W32# x#) `rotate` (I# i#) | i'# ==# 0# = W32# x# | otherwise = W32# (narrow32Word# ((x# `shiftL#` i'#) `or#` (x# `shiftRL#` (32# -# i'#)))) where i'# = word2Int# (int2Word# i# `and#` int2Word# 31#) The i'# gets inlined, so the case statement isn't actually actually doing an analysis on i#, rather it is doing an analysis on i# `and#` 31#. So this lends support to SPJ's view that we need to check to see if a variable is being used in an application of a primop that can be evaluated, and all the other arguements are literals. -- Russell O'Connor http://r6.ca/ ``All talk about `theft,''' the general counsel of the American Graphophone Company wrote, ``is the merest claptrap, for there exists no property in ideas musical, literary or artistic, except as defined by statute.''

| I might point out that the current code would throw out those discounts (the | nukeSrutDiscounts in that case). Ah yes. I've forgotten why nukeScrutDiscounts is there. If you have f x = let y = case x of ... in ... then the nukeScrutDiscount will avoid giving a discount to x. I'm not sure why... if supplying a value for x made the function very small (by making a big case shrink) then it deserves its discount. Simon
participants (6)
-
Bulat Ziganshin
-
Ian Lynagh
-
John Meacham
-
roconnor@theorem.ca
-
Samuel Bronson
-
Simon Peyton-Jones