{-# INLINE me_harder #-}

So I'm trying to program the GHC term rewriting system again (ie the mighty simplifier) and it's not doing what I want it to do without me using unnecessarily large hammers. The crux is that I have a simple function that I want to be inlined early so that my other rules can match on the thing it expands into, however despite the {-# INLINE #-} pragma, it doesn't get inlined until a much later phase when it's too late for the rule to match. So currently I'm forced to use {-# RULES #-} to achieve the same effect, ie expanding the definition in an early phase. This feel like a hack of course. So either the INLINE heuristics could be tweaked to make these cases work or perhaps we should consider adding some other pragma for when we don't want to go by heuristics but are instead deliberately trying to do term rewriting. Anyway, here's the example. It's binary deserialisation. We have a couple important primitives: read :: Int -> (Ptr Word8 -> IO a) -> Get a and the applicative combinators (<$>) (<*>) for Get: fmapGet :: (a -> b) -> Get a -> Get b apGet :: Get (a -> b) -> Get a -> Get b So that we'll be able to match on these, we delay their inlining: {-# INLINE [0] apGet #-} {-# INLINE [0] fmapGet #-} {-# INLINE [0] read #-} The important rules linking these are: {-# RULES "fmap/read" forall f n a. f `fmapGet` read n a = ... "read/read" forall n m f x. read n f `apGet` read m x = ... #-} and finally we have: word8 :: Get Word8 word8 = read 1 peek {-# INLINE word8 #-} So when we write something like: foo :: Get (Word8,Word8,Word8) foo = (,,) <$> Get.word8 <*> Get.word8 <*> Get.word8 we want to expand word8 into it's definition in terms of read and then have the rules fire (which shares the bounds checks). Looking at the simplifier iterations we get to this step: ==================== Simplifier phase 2, iteration 1 out of 4 ==================== PutTest.foo :: Get.Get (GHC.Word.Word8, GHC.Word.Word8, GHC.Word.Word8) [Exported] [] PutTest.foo = Get.apGet @ GHC.Word.Word8 @ (GHC.Word.Word8, GHC.Word.Word8, GHC.Word.Word8) (Get.apGet @ GHC.Word.Word8 @ (GHC.Word.Word8 -> (GHC.Word.Word8, GHC.Word.Word8, GHC.Word.Word8)) (Get.fmapGet @ GHC.Word.Word8 @ (GHC.Word.Word8 -> GHC.Word.Word8 -> (GHC.Word.Word8, GHC.Word.Word8, GHC.Word.Word8)) (Data.Tuple.(,,) @ GHC.Word.Word8 @ GHC.Word.Word8 @ GHC.Word.Word8) Get.word8) Get.word8) Get.word8 Here is where we really must inline word8 or we're going to miss our opportunity once apGet and fmapGet get inlined. Sadly it doesn't inline it here (in phase 2), or in the next phase (in fact it does nothing in phase 1 at all). We get all the way to phase 0 and then go and inline everything (including word8). The behaviour is the same for 6.6 and 6.7.recentish (one month old). If we hit word8 with a bigger hammer: {-# RULES "inline word8" word8 = read 1 peekWord8 #-} then the whole thing works perfectly and we get really nice STG code at the end. I was under the impression that GHC considered a function more 'interesting' if it was mentioned in the LHS of a rule as is the case here. What is the right thing for me to do in this case? Just use the rule to do the inlining? The full source is here: http://haskell.org/~duncan/binary/ Get.hs and GetTest.hs (In the same dir is an example showing the opposite problem, GHC inlining a function when I asked explicitly for it not to. I reported that problem previously. Though I now realise I never filed a bug in trac for that one. I'll file bugs for both.) On the good news side, I'm getting excellent performance results for the serialisation and judging from the STG code for the deserialisation I expect it'll be great too. Duncan

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jhc has a pragma named "SUPERINLINE" that just means "try even harder to inline this", I think, e.g. it has {-# SUPERINLINE id, const, (.), ($), ($!), flip #-} It just reminded me of the name -- I'm not sure whether it actually makes sense to use that same name for this purpose. Isaac -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGJgoRHgcxvIWYTTURAq2RAKDS9MYsIg1McuMpApXWMQIRI5/NigCfTMAY 8V0FJURxAAQTj+mubjqe4Uw= =Uw9p -----END PGP SIGNATURE-----

On Wed, Apr 18, 2007 at 08:07:45AM -0400, Isaac Dupree wrote:
Jhc has a pragma named "SUPERINLINE" that just means "try even harder to inline this", I think, e.g. it has {-# SUPERINLINE id, const, (.), ($), ($!), flip #-}
It just reminded me of the name -- I'm not sure whether it actually makes sense to use that same name for this purpose.
Actually, SUPERINLINE basically means 'treat this as if it were a term-rewriting macro', so, it really always does inline no matter what, fairly early in the compilation process. Even if the call is unsaturated and it will end up being let-floated out anyway. I originally added it so some early flow-insensitive optimizations wouldn't end up doing silly things like unifying all uses of 'id' program-wide. The compiler is smarter now, so it is less needed, but it still exists. John -- John Meacham - ⑆repetae.net⑆john⑈

Duncan I've been meaning to reply to this. It's very difficult to get inlining right all the time. Even for a function marked INLINE, there's really no point in inlining in some contexts. E.g. map f xs (don't inline f). Furthermore, for parameter-less things like 'word8' GHC has to worry about losing sharing. Because inlining is already a tricky area, I'm reluctant to make it more complicated still. In your case, though, you really, really want your function inlined. For that, the RULE approach seems quite reasonable. But I can see you don't want to write out the RHS of 'word8' twice, once in open code and once in the RULE. That suggests a pragma, SUPERINLINE or something, which is a bit like SPECIALISE: - it generates a RULE - you don't have to write out the RHS of the RULE yourself - the RULE is generated by the desugarer It's a bit like "specialise for every single call site"! If you want to have a look, check DsBinds line 186 or so, where dsSpec generates RULES for specialisation. Right nearby the INLINE pragmas do stuff. If you are motivated, we could discuss the design a bit more and then you could go ahead and implement it. Simon | -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users-bounces@haskell.org] On | Behalf Of Duncan Coutts | Sent: 18 April 2007 08:22 | To: GHC Users Mailing List | Subject: {-# INLINE me_harder #-} | | So I'm trying to program the GHC term rewriting system again (ie the | mighty simplifier) and it's not doing what I want it to do without me | using unnecessarily large hammers. | | The crux is that I have a simple function that I want to be inlined | early so that my other rules can match on the thing it expands into, | however despite the {-# INLINE #-} pragma, it doesn't get inlined until | a much later phase when it's too late for the rule to match. | | So currently I'm forced to use {-# RULES #-} to achieve the same effect, | ie expanding the definition in an early phase. This feel like a hack of | course. So either the INLINE heuristics could be tweaked to make these | cases work or perhaps we should consider adding some other pragma for | when we don't want to go by heuristics but are instead deliberately | trying to do term rewriting. | | Anyway, here's the example. It's binary deserialisation. | | We have a couple important primitives: | | read :: Int -> (Ptr Word8 -> IO a) -> Get a | | and the applicative combinators (<$>) (<*>) for Get: | | fmapGet :: (a -> b) -> Get a -> Get b | apGet :: Get (a -> b) -> Get a -> Get b | | So that we'll be able to match on these, we delay their inlining: | {-# INLINE [0] apGet #-} | {-# INLINE [0] fmapGet #-} | {-# INLINE [0] read #-} | | The important rules linking these are: | | {-# RULES | | "fmap/read" forall f n a. | f `fmapGet` read n a = ... | | "read/read" forall n m f x. | read n f `apGet` read m x = ... | | #-} | | and finally we have: | | word8 :: Get Word8 | word8 = read 1 peek | {-# INLINE word8 #-} | | So when we write something like: | | foo :: Get (Word8,Word8,Word8) | foo = (,,) <$> Get.word8 <*> Get.word8 <*> Get.word8 | | we want to expand word8 into it's definition in terms of read and then | have the rules fire (which shares the bounds checks). | | Looking at the simplifier iterations we get to this step: | | ==================== Simplifier phase 2, iteration 1 out of 4 ==================== | PutTest.foo :: Get.Get (GHC.Word.Word8, GHC.Word.Word8, GHC.Word.Word8) | [Exported] | [] | PutTest.foo = | Get.apGet | @ GHC.Word.Word8 | @ (GHC.Word.Word8, GHC.Word.Word8, GHC.Word.Word8) | (Get.apGet | @ GHC.Word.Word8 | @ (GHC.Word.Word8 -> (GHC.Word.Word8, GHC.Word.Word8, GHC.Word.Word8)) | (Get.fmapGet | @ GHC.Word.Word8 | @ (GHC.Word.Word8 | -> GHC.Word.Word8 | -> (GHC.Word.Word8, GHC.Word.Word8, GHC.Word.Word8)) | (Data.Tuple.(,,) @ GHC.Word.Word8 @ GHC.Word.Word8 @ GHC.Word.Word8) | Get.word8) | Get.word8) | Get.word8 | | Here is where we really must inline word8 or we're going to miss our | opportunity once apGet and fmapGet get inlined. | | Sadly it doesn't inline it here (in phase 2), or in the next phase (in | fact it does nothing in phase 1 at all). We get all the way to phase 0 | and then go and inline everything (including word8). | | The behaviour is the same for 6.6 and 6.7.recentish (one month old). | | If we hit word8 with a bigger hammer: | {-# RULES "inline word8" word8 = read 1 peekWord8 #-} | then the whole thing works perfectly and we get really nice STG code at | the end. | | I was under the impression that GHC considered a function more | 'interesting' if it was mentioned in the LHS of a rule as is the case | here. What is the right thing for me to do in this case? Just use the | rule to do the inlining? | | The full source is here: | http://haskell.org/~duncan/binary/ | | Get.hs and GetTest.hs | | (In the same dir is an example showing the opposite problem, GHC | inlining a function when I asked explicitly for it not to. I reported | that problem previously. Though I now realise I never filed a bug in | trac for that one. I'll file bugs for both.) | | On the good news side, I'm getting excellent performance results for the | serialisation and judging from the STG code for the deserialisation I | expect it'll be great too. | | Duncan | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Simon Peyton-Jones wrote:
It's very difficult to get inlining right all the time. Even for a function marked INLINE, there's really no point in inlining in some contexts. E.g. map f xs (don't inline f).
Would it make sense to tentatively inline anyway, and in a later stage, if that bit of code is still equivalent to a call to f (i.e., no optimizations have taken advantage of it), replace it with a reference to f? Isaac -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGRMymHgcxvIWYTTURAijeAJ9xRHqy0PphOP/FdMeuLNfNH50LVACdHd4p nV5wPk9h7nmOsp30PgCJLiE= =At6a -----END PGP SIGNATURE-----

On Fri, 2007-05-11 at 16:05 -0400, Isaac Dupree wrote:
Simon Peyton-Jones wrote:
It's very difficult to get inlining right all the time. Even for a function marked INLINE, there's really no point in inlining in some contexts. E.g. map f xs (don't inline f).
Would it make sense to tentatively inline anyway, and in a later stage, if that bit of code is still equivalent to a call to f (i.e., no optimizations have taken advantage of it), replace it with a reference to f?
Not really because, so long as it remains in the form of map f xs we can always choose to inline map whenever it later looks like it might be beneficial (eg when we know something about f or xs). Also, uninlining is nigh on impossible. Duncan

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Duncan Coutts wrote:
Also, uninlining is nigh on impossible.
I would say that's the critical problem with my notion... Why is it so difficult? Is it because it's too easy for some minor "optimization"/change to be made in the Core representation, that's not actually very useful, before we would want to run the "un-inliner"? Or is it too hard to remember or match for equivalence with whatever we'd want to try to turn it into? Or because it might reduce performance a little in some undesirable way? (some combination of those, I guess) Isaac -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGRcdoHgcxvIWYTTURAugtAJwKcZ6Z9Mr55JS6s5wWySDEt+mCKACcDoBK 0kA2wlNMr9Grem10evDEovQ= =Jl6G -----END PGP SIGNATURE-----

On Sat, 2007-05-12 at 09:55 -0400, Isaac Dupree wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Duncan Coutts wrote:
Also, uninlining is nigh on impossible.
I would say that's the critical problem with my notion... Why is it so difficult? Is it because it's too easy for some minor "optimization"/change to be made in the Core representation, that's not actually very useful, before we would want to run the "un-inliner"? Or is it too hard to remember or match for equivalence with whatever we'd want to try to turn it into? Or because it might reduce performance a little in some undesirable way? (some combination of those, I guess)
Mainly the first issue. You can try and match the pattern of what your function expanded into but it's very likely that it will not actually match because it'll have been changed slightly after it was expanded. If it has been changed by some successful optimisation then that's fine, but there are plenty of ways it could be slightly rearranged that are not necessarily optimisations. It's not an impossible approach. The build/forld and stream fusion systems works this way. In an early optimisation phase of they rewrite the list functions into their fusible form, then later if the functions did not fuse then the get rewritten back to the unfusible form. This is very much like inlining and uninlining. What makes this feasible is that fusible form is already carefully defined and fairly simple. We have to go to some effort to keep things in this fusible form (eg by preventing premature inlineing). So this allows us to successfully pattern match the fusible form and rewrite/uninline it back to the standard form. Doing something like this for arbitrary function definition would be much harder. Duncan
participants (4)
-
Duncan Coutts
-
Isaac Dupree
-
John Meacham
-
Simon Peyton-Jones