Fusing loops by specializing on functions with SpecConstr?

Alexis King

29 Mar 2020 29 Mar '20

8:38 a.m.

Hi all, I have recently been toying with FRP, and I’ve noticed that traditional formulations generate a lot of tiny loops that GHC does a very poor job optimizing. Here’s a simplified example: newtype SF a b = SF { runSF :: a -> (b, SF a b) } add1_snd :: SF (String, Int) (String, Int) add1_snd = second add1 where add1 = SF $ \a -> let !b = a + 1 in (b, add1) second f = SF $ \(a, b) -> let !(c, f') = runSF f b in ((a, c), second f') Here, `add1_snd` is defined in terms of two recursive bindings, `add1` and `second`. Because they’re both recursive, GHC doesn’t know what to do with them, and the optimized program still has two separate recursive knots. But this is a missed optimization, as `add1_snd` is equivalent to the following definition, which fuses the two loops together and consequently has just one recursive knot: add1_snd_fused :: SF (String, Int) (String, Int) add1_snd_fused = SF $ \(a, b) -> let !c = b + 1 in ((a, c), add1_snd_fused) How could GHC get from `add1_snd` to `add1_snd_fused`? In theory, SpecConstr could do it! Suppose we specialize `second` at the call pattern `second add1`: {-# RULE "second/add1" second add1 = second_add1 #-} second_add1 = SF $ \(a, b) -> let !(c, f') = runSF add1 b in ((a, c), second f') This doesn’t immediately look like an improvement, but we’re actually almost there. If we unroll `add1` once on the RHS of `second_add1`, the simplifier will get us the rest of the way. We’ll end up with let !b1 = b + 1 !(c, f') = (b1, add1) in ((a, c), second f') and after substituting f' to get `second add1`, the RULE will tie the knot for us. This may look like small potatoes in isolation, but real programs can generate hundreds of these tiny, tiny loops, and fusing them together would be a big win. The only problem is SpecConstr doesn’t currently specialize on functions! The original paper, “Call-pattern Specialisation for Haskell Programs,” mentions this as a possibility in Section 6.2, but it points out that actually doing this in practice would be pretty tricky:

...

Specialising for function arguments is more slippery than for constructor arguments. In the example above the argument was a simple variable, but what if it was instead a lambda term? [...]

The trouble is that lambda abstractions are much more fragile than constructor applications, in the sense that simple transformations may make two abstractions look different although they have the same value.

Show replies by date

Sebastian Graf

29 Mar 29 Mar

2:33 p.m.

Hi Alexis, I've been wondering the same things and have worked on it on and off. See my progress in https://gitlab.haskell.org/ghc/ghc/issues/855#note_149482 and https://gitlab.haskell.org/ghc/ghc/issues/915#note_241520. The big problem with solving the higher-order specialisation problem through SpecConstr (which is what I did in my reports in #855) is indeed that it's hard to 1. Anticipate what the rewritten program looks like without doing a Simplifier pass after each specialisation, so that we can see and exploit new specialisation opportunities. SpecConstr does use the simple Core optimiser but, that often is not enough IIRC (think of ArgOccs from recursive calls). In particular, it will not do RULE rewrites. Interleaving SpecConstr with the Simplifier, apart from nigh impossible conceptually, is computationally intractable and would quickly drift off into Partial Evaluation swamp. 2. Make the RULE engine match and rewrite call sites in all call patterns they can apply. I.e., `f (\x -> Just (x +1))` calls its argument with one argument and scrutinises the resulting Maybe (that's what is described by the argument's `ArgOcc`), so that we want to specialise to a call pattern `f (\x -> Just <some expression using x>)`, giving rise to the specialisation `$sf ctx`, where `ctx x` describes the `<some expression using x>` part. In an ideal world, we want a (higher-order pattern unification) RULE for `forall f ctx. f (\x -> Just (ctx x)) ==> $sf ctx`. But from what I remember, GHC's RULE engine works quite different from that and isn't even concerned with finding unifiers (rather than just matching concrete call sites without meta variables against RULEs with meta variables) at all. Note that matching on specific Ids binding functions is just an approximation using representional equality (on the Id's Unique) rather than some sort of more semantic equality. My latest endeavour into the matter in #915 from December was using types as the representational entity and type class specialisation. I think I got ultimately blocked on thttps:// gitlab.haskell.org/ghc/ghc/issues/17548, but apparently I didn't document the problematic program. Maybe my failure so far is that I want it to apply and optimise all cases and for more complex stream pipelines, rather than just doing a better best effort job. Hope that helps. Anyway, I'm also really keen on nailing this! It's one of my high-risk, high-reward research topics. So if you need someone to collaborate/exchange ideas with, I'm happy to help! All the best, Sebastian Am So., 29. März 2020 um 10:39 Uhr schrieb Alexis King < lexi.lambda@gmail.com>:

...

Hi all,

I have recently been toying with FRP, and I’ve noticed that traditional formulations generate a lot of tiny loops that GHC does a very poor job optimizing. Here’s a simplified example:

newtype SF a b = SF { runSF :: a -> (b, SF a b) }

add1_snd :: SF (String, Int) (String, Int) add1_snd = second add1 where add1 = SF $ \a -> let !b = a + 1 in (b, add1) second f = SF $ \(a, b) -> let !(c, f') = runSF f b in ((a, c), second f')

Here, `add1_snd` is defined in terms of two recursive bindings, `add1` and `second`. Because they’re both recursive, GHC doesn’t know what to do with them, and the optimized program still has two separate recursive knots. But this is a missed optimization, as `add1_snd` is equivalent to the following definition, which fuses the two loops together and consequently has just one recursive knot:

add1_snd_fused :: SF (String, Int) (String, Int) add1_snd_fused = SF $ \(a, b) -> let !c = b + 1 in ((a, c), add1_snd_fused)

How could GHC get from `add1_snd` to `add1_snd_fused`? In theory, SpecConstr could do it! Suppose we specialize `second` at the call pattern `second add1`:

{-# RULE "second/add1" second add1 = second_add1 #-}

second_add1 = SF $ \(a, b) -> let !(c, f') = runSF add1 b in ((a, c), second f')

This doesn’t immediately look like an improvement, but we’re actually almost there. If we unroll `add1` once on the RHS of `second_add1`, the simplifier will get us the rest of the way. We’ll end up with

let !b1 = b + 1 !(c, f') = (b1, add1) in ((a, c), second f')

and after substituting f' to get `second add1`, the RULE will tie the knot for us.

This may look like small potatoes in isolation, but real programs can generate hundreds of these tiny, tiny loops, and fusing them together would be a big win. The only problem is SpecConstr doesn’t currently specialize on functions! The original paper, “Call-pattern Specialisation for Haskell Programs,” mentions this as a possibility in Section 6.2, but it points out that actually doing this in practice would be pretty tricky:

...
Specialising for function arguments is more slippery than for constructor arguments. In the example above the argument was a simple variable, but what if it was instead a lambda term? [...]

The trouble is that lambda abstractions are much more fragile than constructor applications, in the sense that simple transformations may make two abstractions look different although they have the same value.

Still, the difference this could make in a program of mine is so large that I am interested in exploring it anyway. I am wondering if anyone has investigated this possibility any further since the paper was published, or if anyone knows of other use cases that would benefit from this capability.

Thanks, Alexis _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Simon Peyton Jones

31 Mar 31 Mar

11:12 a.m.

Wow – tricky stuff! I would never have thought of trying to optimise that program, but it’s fascinating that you get lots and lots of them from FRP. * Don’t lose this thread! Make a ticket, or a wiki page. If the former, put the main payload (including Alexis’s examples) into the Descriptions, not deep in the discussion. * I wonder whether it’d be possible to adjust the FRP library to generate easier-to-optimise code. Probably not, but worth asking. * Alexis’s proposed solution relies on * Specialising on a function argument. Clearly this must be possible, and it’d be very beneficial. * Unrolling one layer of a recursive function. That seems harder: how we know to *stop* unrolling as we successively simplify? One idea: do one layer of unrolling by hand, perhaps even in FRP source code: add1rec = SF (\a -> let !b = a+1 in (b,add1rec)) add1 = SF (\a -> let !b = a+1 in (b,add1rec)) Simon From: ghc-devs On Behalf Of Sebastian Graf Sent: 29 March 2020 15:34 To: Alexis King Cc: ghc-devs Subject: Re: Fusing loops by specializing on functions with SpecConstr? Hi Alexis, I've been wondering the same things and have worked on it on and off. See my progress in https://gitlab.haskell.org/ghc/ghc/issues/855#note_149482 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2Fissues%2F855%23note_149482&data=02%7C01%7Csimonpj%40microsoft.com%7Cab7afece6b43485f5e5508d7d3ee4cfc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637210892758857668&sdata=BWptTEUj%2BcKu1cEkYiFtDuBRHKKzl%2BkVxUzV%2FRIje1c%3D&reserved=0 and https://gitlab.haskell.org/ghc/ghc/issues/915#note_241520 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2Fissues%2F915%23note_241520&data=02%7C01%7Csimonpj%40microsoft.com%7Cab7afece6b43485f5e5508d7d3ee4cfc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637210892758867663&sdata=w5cJDvwz0e1RWq3c%2BrG12McHTt9H%2FkMzRnUlyAS22bM%3D&reserved=0. The big problem with solving the higher-order specialisation problem through SpecConstr (which is what I did in my reports in #855) is indeed that it's hard to 1. Anticipate what the rewritten program looks like without doing a Simplifier pass after each specialisation, so that we can see and exploit new specialisation opportunities. SpecConstr does use the simple Core optimiser but, that often is not enough IIRC (think of ArgOccs from recursive calls). In particular, it will not do RULE rewrites. Interleaving SpecConstr with the Simplifier, apart from nigh impossible conceptually, is computationally intractable and would quickly drift off into Partial Evaluation swamp. 2. Make the RULE engine match and rewrite call sites in all call patterns they can apply. I.e., `f (\x -> Just (x +1))` calls its argument with one argument and scrutinises the resulting Maybe (that's what is described by the argument's `ArgOcc`), so that we want to specialise to a call pattern `f (\x -> Just <some expression using x>)`, giving rise to the specialisation `$sf ctx`, where `ctx x` describes the `<some expression using x>` part. In an ideal world, we want a (higher-order pattern unification) RULE for `forall f ctx. f (\x -> Just (ctx x)) ==> $sf ctx`. But from what I remember, GHC's RULE engine works quite different from that and isn't even concerned with finding unifiers (rather than just matching concrete call sites without meta variables against RULEs with meta variables) at all. Note that matching on specific Ids binding functions is just an approximation using representional equality (on the Id's Unique) rather than some sort of more semantic equality. My latest endeavour into the matter in #915 from December was using types as the representational entity and type class specialisation. I think I got ultimately blocked on thttps://gitlab.haskell.org/ghc/ghc/issues/17548 https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2Fissues%2F17548&data=02%7C01%7Csimonpj%40microsoft.com%7Cab7afece6b43485f5e5508d7d3ee4cfc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637210892758877658&sdata=x6YJBWtNzzX2ad05yr2KoAE7G42A7agIFINws0VI%2BeY%3D&reserved=0, but apparently I didn't document the problematic program. Maybe my failure so far is that I want it to apply and optimise all cases and for more complex stream pipelines, rather than just doing a better best effort job. Hope that helps. Anyway, I'm also really keen on nailing this! It's one of my high-risk, high-reward research topics. So if you need someone to collaborate/exchange ideas with, I'm happy to help! All the best, Sebastian Am So., 29. März 2020 um 10:39 Uhr schrieb Alexis King mailto:lexi.lambda@gmail.com>: Hi all, I have recently been toying with FRP, and I’ve noticed that traditional formulations generate a lot of tiny loops that GHC does a very poor job optimizing. Here’s a simplified example: newtype SF a b = SF { runSF :: a -> (b, SF a b) } add1_snd :: SF (String, Int) (String, Int) add1_snd = second add1 where add1 = SF $ \a -> let !b = a + 1 in (b, add1) second f = SF $ \(a, b) -> let !(c, f') = runSF f b in ((a, c), second f') Here, `add1_snd` is defined in terms of two recursive bindings, `add1` and `second`. Because they’re both recursive, GHC doesn’t know what to do with them, and the optimized program still has two separate recursive knots. But this is a missed optimization, as `add1_snd` is equivalent to the following definition, which fuses the two loops together and consequently has just one recursive knot: add1_snd_fused :: SF (String, Int) (String, Int) add1_snd_fused = SF $ \(a, b) -> let !c = b + 1 in ((a, c), add1_snd_fused) How could GHC get from `add1_snd` to `add1_snd_fused`? In theory, SpecConstr could do it! Suppose we specialize `second` at the call pattern `second add1`: {-# RULE "second/add1" second add1 = second_add1 #-} second_add1 = SF $ \(a, b) -> let !(c, f') = runSF add1 b in ((a, c), second f') This doesn’t immediately look like an improvement, but we’re actually almost there. If we unroll `add1` once on the RHS of `second_add1`, the simplifier will get us the rest of the way. We’ll end up with let !b1 = b + 1 !(c, f') = (b1, add1) in ((a, c), second f') and after substituting f' to get `second add1`, the RULE will tie the knot for us. This may look like small potatoes in isolation, but real programs can generate hundreds of these tiny, tiny loops, and fusing them together would be a big win. The only problem is SpecConstr doesn’t currently specialize on functions! The original paper, “Call-pattern Specialisation for Haskell Programs,” mentions this as a possibility in Section 6.2, but it points out that actually doing this in practice would be pretty tricky:

...

Specialising for function arguments is more slippery than for constructor arguments. In the example above the argument was a simple variable, but what if it was instead a lambda term? [...]

The trouble is that lambda abstractions are much more fragile than constructor applications, in the sense that simple transformations may make two abstractions look different although they have the same value.

Still, the difference this could make in a program of mine is so large that I am interested in exploring it anyway. I am wondering if anyone has investigated this possibility any further since the paper was published, or if anyone knows of other use cases that would benefit from this capability. Thanks, Alexis _______________________________________________ ghc-devs mailing list ghc-devs@haskell.orgmailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=02%7C01%7Csimonpj%40microsoft.com%7Cab7afece6b43485f5e5508d7d3ee4cfc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637210892758877658&sdata=tTFc4DHgkLgTxAomYoFk7xsNGp8oiOWH8Hd4KcDrqvc%3D&reserved=0

Sebastian Graf

1:08 p.m.

We can formulate SF as a classic Stream that needs an `a` to produce its next element of type `b` like this (SF2 below): {-# LANGUAGE BangPatterns #-} {-# LANGUAGE GADTs #-} module Lib where newtype SF a b = SF { runSF :: a -> (b, SF a b) } inc1 :: SF Int Int inc1 = SF $ \a -> let !b = a+1 in (b, inc1) data Step s a = Yield !s a data SF2 a b where SF2 :: !(a -> s -> Step s b) -> !s -> SF2 a b inc2 :: SF2 Int Int inc2 = SF2 go () where go a _ = let !b = a+1 in Yield () b runSF2 :: SF2 a b -> a -> (b, SF2 a b) runSF2 (SF2 f s) a = case f a s of Yield s' b -> (b, (SF2 f s')) Note the absence of recursion in inc2. This resolves the tension around having to specialise for a function argument that is recursive and having to do the unrolling. I bet that similar to stream fusion, we can arrange that only the consumer has to be explicitly recursive. Indeed, I think this will help you inline mapping combinators such as `second`, because it won't be recursive itself anymore. Now we "only" have to solve the same problems as with good old stream fusion. The tricky case (after realising that we need to add `Skip` to `Step` for `filterSF2`) is when we want to optimise a signal of signals, e.g. something like `concatMapSF2 :: (b -> SF2 a c) -> SF2 a b -> SF2 a c` or some such. And here we are again in #855/#915. Also if you need convincing that we can embed any SF into SF2, look at this: embed :: SF Int Int -> SF2 Int Int embed origSF = SF2 go origSF where go a sf = case runSF sf a of (b, sf') -> Yield sf' b Please do open a ticket about this, though. It's an interesting data point! Cheers, Sebastian Am Di., 31. März 2020 um 13:12 Uhr schrieb Simon Peyton Jones < simonpj@microsoft.com>:

...

Wow – tricky stuff! I would never have thought of trying to optimise that program, but it’s fascinating that you get lots and lots of them from FRP.

- Don’t lose this thread! Make a ticket, or a wiki page. If the former, put the main payload (including Alexis’s examples) into the Descriptions, not deep in the discussion. - I wonder whether it’d be possible to adjust the FRP library to generate easier-to-optimise code. Probably not, but worth asking. - Alexis’s proposed solution relies on - Specialising on a function argument. Clearly this must be possible, and it’d be very beneficial. - Unrolling one layer of a recursive function. That seems harder: how we know to **stop** unrolling as we successively simplify? One idea: do one layer of unrolling by hand, perhaps even in FRP source code:

add1rec = SF (\a -> let !b = a+1 in (b,add1rec))

add1 = SF (\a -> let !b = a+1 in (b,add1rec))

Simon

*From:* ghc-devs *On Behalf Of *Sebastian Graf *Sent:* 29 March 2020 15:34 *To:* Alexis King *Cc:* ghc-devs *Subject:* Re: Fusing loops by specializing on functions with SpecConstr?

Hi Alexis,

I've been wondering the same things and have worked on it on and off. See my progress in https://gitlab.haskell.org/ghc/ghc/issues/855#note_149482 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2Fissues%2F855%23note_149482&data=02%7C01%7Csimonpj%40microsoft.com%7Cab7afece6b43485f5e5508d7d3ee4cfc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637210892758857668&sdata=BWptTEUj%2BcKu1cEkYiFtDuBRHKKzl%2BkVxUzV%2FRIje1c%3D&reserved=0 and https://gitlab.haskell.org/ghc/ghc/issues/915#note_241520 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2Fissues%2F915%23note_241520&data=02%7C01%7Csimonpj%40microsoft.com%7Cab7afece6b43485f5e5508d7d3ee4cfc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637210892758867663&sdata=w5cJDvwz0e1RWq3c%2BrG12McHTt9H%2FkMzRnUlyAS22bM%3D&reserved=0 .

The big problem with solving the higher-order specialisation problem through SpecConstr (which is what I did in my reports in #855) is indeed that it's hard to

1. Anticipate what the rewritten program looks like without doing a Simplifier pass after each specialisation, so that we can see and exploit new specialisation opportunities. SpecConstr does use the simple Core optimiser but, that often is not enough IIRC (think of ArgOccs from recursive calls). In particular, it will not do RULE rewrites. Interleaving SpecConstr with the Simplifier, apart from nigh impossible conceptually, is computationally intractable and would quickly drift off into Partial Evaluation swamp. 2. Make the RULE engine match and rewrite call sites in all call patterns they can apply. I.e., `f (\x -> Just (x +1))` calls its argument with one argument and scrutinises the resulting Maybe (that's what is described by the argument's `ArgOcc`), so that we want to specialise to a call pattern `f (\x -> Just <some expression using x>)`, giving rise to the specialisation `$sf ctx`, where `ctx x` describes the `<some expression using x>` part. In an ideal world, we want a (higher-order pattern unification) RULE for `forall f ctx. f (\x -> Just (ctx x)) ==> $sf ctx`. But from what I remember, GHC's RULE engine works quite different from that and isn't even concerned with finding unifiers (rather than just matching concrete call sites without meta variables against RULEs with meta variables) at all.

Note that matching on specific Ids binding functions is just an approximation using representional equality (on the Id's Unique) rather than some sort of more semantic equality. My latest endeavour into the matter in #915 from December was using types as the representational entity and type class specialisation. I think I got ultimately blocked on thttps:// gitlab.haskell.org/ghc/ghc/issues/17548 https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2Fissues%2F17548&data=02%7C01%7Csimonpj%40microsoft.com%7Cab7afece6b43485f5e5508d7d3ee4cfc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637210892758877658&sdata=x6YJBWtNzzX2ad05yr2KoAE7G42A7agIFINws0VI%2BeY%3D&reserved=0, but apparently I didn't document the problematic program.

Maybe my failure so far is that I want it to apply and optimise all cases and for more complex stream pipelines, rather than just doing a better best effort job.

Hope that helps. Anyway, I'm also really keen on nailing this! It's one of my high-risk, high-reward research topics. So if you need someone to collaborate/exchange ideas with, I'm happy to help!

All the best,

Sebastian

Am So., 29. März 2020 um 10:39 Uhr schrieb Alexis King < lexi.lambda@gmail.com>:

Hi all,

I have recently been toying with FRP, and I’ve noticed that traditional formulations generate a lot of tiny loops that GHC does a very poor job optimizing. Here’s a simplified example:

newtype SF a b = SF { runSF :: a -> (b, SF a b) }

add1_snd :: SF (String, Int) (String, Int) add1_snd = second add1 where add1 = SF $ \a -> let !b = a + 1 in (b, add1) second f = SF $ \(a, b) -> let !(c, f') = runSF f b in ((a, c), second f')

Here, `add1_snd` is defined in terms of two recursive bindings, `add1` and `second`. Because they’re both recursive, GHC doesn’t know what to do with them, and the optimized program still has two separate recursive knots. But this is a missed optimization, as `add1_snd` is equivalent to the following definition, which fuses the two loops together and consequently has just one recursive knot:

add1_snd_fused :: SF (String, Int) (String, Int) add1_snd_fused = SF $ \(a, b) -> let !c = b + 1 in ((a, c), add1_snd_fused)

How could GHC get from `add1_snd` to `add1_snd_fused`? In theory, SpecConstr could do it! Suppose we specialize `second` at the call pattern `second add1`:

{-# RULE "second/add1" second add1 = second_add1 #-}

second_add1 = SF $ \(a, b) -> let !(c, f') = runSF add1 b in ((a, c), second f')

This doesn’t immediately look like an improvement, but we’re actually almost there. If we unroll `add1` once on the RHS of `second_add1`, the simplifier will get us the rest of the way. We’ll end up with

let !b1 = b + 1 !(c, f') = (b1, add1) in ((a, c), second f')

and after substituting f' to get `second add1`, the RULE will tie the knot for us.

This may look like small potatoes in isolation, but real programs can generate hundreds of these tiny, tiny loops, and fusing them together would be a big win. The only problem is SpecConstr doesn’t currently specialize on functions! The original paper, “Call-pattern Specialisation for Haskell Programs,” mentions this as a possibility in Section 6.2, but it points out that actually doing this in practice would be pretty tricky:

...
Specialising for function arguments is more slippery than for constructor arguments. In the example above the argument was a simple variable, but what if it was instead a lambda term? [...]

The trouble is that lambda abstractions are much more fragile than constructor applications, in the sense that simple transformations may make two abstractions look different although they have the same value.

Still, the difference this could make in a program of mine is so large that I am interested in exploring it anyway. I am wondering if anyone has investigated this possibility any further since the paper was published, or if anyone knows of other use cases that would benefit from this capability.

Thanks, Alexis _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=02%7C01%7Csimonpj%40microsoft.com%7Cab7afece6b43485f5e5508d7d3ee4cfc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637210892758877658&sdata=tTFc4DHgkLgTxAomYoFk7xsNGp8oiOWH8Hd4KcDrqvc%3D&reserved=0

Alexis King

9:18 p.m.

Sebastian and Simon, Thank you both for your responses—they are all quite helpful! I agree with both of you that figuring out how to do this kind of specialization without any guidance from the programmer seems rather intractable. It’s too hard to divine where it would actually be beneficial, and even if you could, it seems likely that other optimizations would get in the way of it actually working out. I’ve been trying to figure out if it would be possible to help the optimizer out by annotating the program with special combinators like the existing ones provided by GHC.Magic. However, I haven’t been able to come up with anything yet that seems like it would actually work.

...

On Mar 31, 2020, at 06:12, Simon Peyton Jones wrote:

Wow – tricky stuff! I would never have thought of trying to optimise that program, but it’s fascinating that you get lots and lots of them from FRP.

For context, the reason you get all these tiny loops is that arrowized FRP uses the Arrow and ArrowChoice interfaces to build its programs, and those interfaces use tiny combinator functions like these: first :: Arrow a => a b c -> a (b, d) (c, d) (***) :: Arrow a => a b d -> a c e -> a (b, c) (d, e) (|||) :: ArrowChoice a => a b d -> a c d -> a (Either b c) d This means you end up with programs built out of dozens or hundreds of uses of these tiny combinators. You get code that looks like first (left (arr f >>> g ||| right h) *** second i) and this is a textbook situation where you want to specialize and inline all the combinators! For arrows without this tricky recursion, doing that works as intended, and GHC’s simplifier will do what it’s supposed to, and you get fast code. But with FRP, each of these combinators is recursive. This means you often get really awful code that looks like this: arr (\case { Nothing -> Left (); Just x -> Right x }) >>> (f ||| g) This converts a Maybe to an Either, then branches on it. It’s analogous to writing something like this in direct-style code: let y = case x of { Nothing -> Left (); Just x -> Right x } in case y of { Left () -> f; Right x -> g x } We really want the optimizer to eliminate the intermediate Either and just branch on it directly, and if GHC could fuse these tiny recursive loops, it could! But without that, all this pointless shuffling of values around remains in the optimized program.

...

I wonder whether it’d be possible to adjust the FRP library to generate easier-to-optimise code. Probably not, but worth asking.

I think it’s entirely possible to somehow annotate these combinators to communicate this information to the optimizer, but I don’t know what the annotations ought to look like. (That’s the research part!) But I’m not very optimistic about getting the library to generate easier-to-optimize code with the tools available today. Sebastian’s example of SF2 and stream fusion sort of works, but in my experience, something like that doesn’t handle enough cases well enough to work on real arrow programs.

...

Unrolling one layer of a recursive function. That seems harder: how we know to *stop* unrolling as we successively simplify? One idea: do one layer of unrolling by hand, perhaps even in FRP source code: add1rec = SF (\a -> let !b = a+1 in (b,add1rec)) add1 = SF (\a -> let !b = a+1 in (b,add1rec))

Yes, I was playing with the idea at one point of some kind of RULE that inserts GHC.Magic.inline on the specialized RHS. That way the programmer could ask for the unrolling explicitly, as otherwise it seems unreasonable to ask the compiler to figure it out.

...

On Mar 31, 2020, at 08:08, Sebastian Graf wrote:

We can formulate SF as a classic Stream that needs an `a` to produce its next element of type `b` like this (SF2 below)

This is a neat trick, though I’ve had trouble getting it to work reliably in my experiments (even though I was using GHC.Types.SPEC). That said, I also feel like I don’t understand the subtleties of SpecConstr very well, so it could have been my fault. The more fundamental problem I’ve found with that approach is that it doesn’t do very well for arrow combinators like (***) and (|||), which come up very often in arrow programs but rarely in streaming. Fusing long chains of first/second/left/right is actually pretty easy with ordinary RULEs, but (***) and (|||) are much harder, since they have multiple continuations. It seems at first appealing to rewrite `f *** g` into `first f >>> second g`, which solves the immediate problem, but this is actually a lot less efficient after repeated rewritings. You end up rewriting `(f ||| g) *** h` into `first (left f) >>> first (right g) >>> second h`, turning two distinct branches into four, and larger programs have much worse exponential blowups. So that’s where I’ve gotten stuck! I’ve been toying with the idea of thinking about expression “shells”, so if you have something like first (a ||| b) >>> c *** second (d ||| e) >>> f then you have a “shell” of the shape first (● ||| ●) >>> ● *** second (● ||| ●) >>> ● which theoretically serves as a key for the specialization. You can then generate a specialization and a rule: $s a b c d e f = ... {-# RULE forall a b c d e f. first (a ||| b) >>> c *** second (d ||| e) >>> f = $s a b c d e f #-} The question then becomes: how do you specify what these shells are, and how do you specify how to transform the shell into a specialized function? I don’t know, but it’s something a Core plugin could theoretically do. Maybe it makes sense for this domain-specific optimization to be a Core pass that runs before the simplifier, like the typeclass specializer currently is, but I haven’t explored that yet. Alexis

Sebastian Graf

10:05 p.m.

...

This is a neat trick, though I’ve had trouble getting it to work reliably in my experiments (even though I was using GHC.Types.SPEC). That said, I also feel like I don’t understand the subtleties of SpecConstr very well, so it could have been my fault.

Yeah, SPEC is quite unreliable, because IIRC at some point it's either consumed or irrelevant. But none of the combinators you mentioned should rely on SpecConstr! They are all non-recursive, so the Simplifier will take care of "specialisation". And it works just fine, I just tried it: https://gist.github.com/sgraf812/d15cd3ee9cc9bd2e72704f90567ef35b `test` there is optimised reasonably well. The problem is that we don't have the concrete a..f so we can't cancel away all allocations. If you give me a closed program where we fail to optimise away every bit of allocation (and it isn't due to size concerns), then I would be surprised. Although there might be a bug in how I encoded the streams, maybe we can be a bit stricter here or there if need be. `test2 = (double &&& inc) >>> arr (uncurry (+)) :: SF Int Int` is such a function that we optimise down to (the equivalent of) `arr (\n -> 3*n+1)`. Maybe you can give a medium-sized program where you think GHC does a poor job at optimising? Am Di., 31. März 2020 um 23:18 Uhr schrieb Alexis King < lexi.lambda@gmail.com>:

...

Sebastian and Simon,

Thank you both for your responses—they are all quite helpful! I agree with both of you that figuring out how to do this kind of specialization without any guidance from the programmer seems rather intractable. It’s too hard to divine where it would actually be beneficial, and even if you could, it seems likely that other optimizations would get in the way of it actually working out.

I’ve been trying to figure out if it would be possible to help the optimizer out by annotating the program with special combinators like the existing ones provided by GHC.Magic. However, I haven’t been able to come up with anything yet that seems like it would actually work.

On Mar 31, 2020, at 06:12, Simon Peyton Jones wrote:

Wow – tricky stuff! I would never have thought of trying to optimise that program, but it’s fascinating that you get lots and lots of them from FRP.

For context, the reason you get all these tiny loops is that arrowized FRP uses the Arrow and ArrowChoice interfaces to build its programs, and those interfaces use tiny combinator functions like these:

first :: Arrow a => a b c -> a (b, d) (c, d) (***) :: Arrow a => a b d -> a c e -> a (b, c) (d, e) (|||) :: ArrowChoice a => a b d -> a c d -> a (Either b c) d

This means you end up with programs built out of dozens or hundreds of uses of these tiny combinators. You get code that looks like

first (left (arr f >>> g ||| right h) *** second i)

and this is a textbook situation where you want to specialize and inline all the combinators! For arrows without this tricky recursion, doing that works as intended, and GHC’s simplifier will do what it’s supposed to, and you get fast code.

But with FRP, each of these combinators is recursive. This means you often get really awful code that looks like this:

arr (\case { Nothing -> Left (); Just x -> Right x }) >>> (f ||| g)

This converts a Maybe to an Either, then branches on it. It’s analogous to writing something like this in direct-style code:

let y = case x of { Nothing -> Left (); Just x -> Right x } in case y of { Left () -> f; Right x -> g x }

We really want the optimizer to eliminate the intermediate Either and just branch on it directly, and if GHC could fuse these tiny recursive loops, it could! But without that, all this pointless shuffling of values around remains in the optimized program.

- I wonder whether it’d be possible to adjust the FRP library to generate easier-to-optimise code. Probably not, but worth asking.

I think it’s entirely possible to somehow annotate these combinators to communicate this information to the optimizer, but I don’t know what the annotations ought to look like. (That’s the research part!)

But I’m not very optimistic about getting the library to generate easier-to-optimize code with the tools available today. Sebastian’s example of SF2 and stream fusion sort of works, but in my experience, something like that doesn’t handle enough cases well enough to work on real arrow programs.

- Unrolling one layer of a recursive function. That seems harder: how we know to **stop** unrolling as we successively simplify? One idea: do one layer of unrolling by hand, perhaps even in FRP source code:

add1rec = SF (\a -> let !b = a+1 in (b,add1rec)) add1 = SF (\a -> let !b = a+1 in (b,add1rec))

Yes, I was playing with the idea at one point of some kind of RULE that inserts GHC.Magic.inline on the specialized RHS. That way the programmer could ask for the unrolling explicitly, as otherwise it seems unreasonable to ask the compiler to figure it out.

On Mar 31, 2020, at 08:08, Sebastian Graf wrote:

We can formulate SF as a classic Stream that needs an `a` to produce its next element of type `b` like this (SF2 below)

This is a neat trick, though I’ve had trouble getting it to work reliably in my experiments (even though I was using GHC.Types.SPEC). That said, I also feel like I don’t understand the subtleties of SpecConstr very well, so it could have been my fault.

The more fundamental problem I’ve found with that approach is that it doesn’t do very well for arrow combinators like (***) and (|||), which come up very often in arrow programs but rarely in streaming. Fusing long chains of first/second/left/right is actually pretty easy with ordinary RULEs, but (***) and (|||) are much harder, since they have multiple continuations.

It seems at first appealing to rewrite `f *** g` into `first f >>> second g`, which solves the immediate problem, but this is actually a lot less efficient after repeated rewritings. You end up rewriting `(f ||| g) *** h` into `first (left f) >>> first (right g) >>> second h`, turning two distinct branches into four, and larger programs have much worse exponential blowups.

So that’s where I’ve gotten stuck! I’ve been toying with the idea of thinking about expression “shells”, so if you have something like

first (a ||| b) >>> c *** second (d ||| e) >>> f

then you have a “shell” of the shape

first (● ||| ●) >>> ● *** second (● ||| ●) >>> ●

which theoretically serves as a key for the specialization. You can then generate a specialization and a rule:

$s a b c d e f = ... {-# RULE forall a b c d e f. first (a ||| b) >>> c *** second (d ||| e) >>> f = $s a b c d e f #-}

The question then becomes: how do you specify what these shells are, and how do you specify how to transform the shell into a specialized function? I don’t know, but it’s something a Core plugin could theoretically do. Maybe it makes sense for this domain-specific optimization to be a Core pass that runs before the simplifier, like the typeclass specializer currently is, but I haven’t explored that yet.

Alexis

Alexis King

11:16 p.m.

...

On Mar 31, 2020, at 17:05, Sebastian Graf wrote:

Yeah, SPEC is quite unreliable, because IIRC at some point it's either consumed or irrelevant. But none of the combinators you mentioned should rely on SpecConstr! They are all non-recursive, so the Simplifier will take care of "specialisation". And it works just fine, I just tried it

Ah! You are right, I did not read carefully enough and misinterpreted. That approach is clever, indeed. I had tried something similar with a CPS encoding, but the piece I was missing was using the existential to tie the final knot. I have tried it out on some of my experiments. It’s definitely a significant improvement, but it isn’t perfect. Here’s a small example: mapMaybeSF :: SF a b -> SF (Maybe a) (Maybe b) mapMaybeSF f = proc v -> case v of Just x -> do y <- f -< x returnA -< Just y Nothing -> returnA -< Nothing Looking at the optimized core, it’s true that the conversion of Maybe to Either and back again gets eliminated, which is wonderful! But what’s less wonderful is the value passed around through `s`: mapMaybeSF = \ (@ a) (@ b) (f :: SF a b) -> case f of { SF @ s f2 s2 -> SF (\ (a1 :: Maybe a) (ds2 :: ((), ((), (((), (((), (((), s), ())), ((), ((), ())))), ((), ()))))) -> Yikes! GHC has no obvious way to clean this type up, so it will just grow indefinitely, and we end up doing a dozen pattern-matches in the body followed by another dozen allocations, just wrapping and unwrapping tuples. Getting rid of that seems probably a lot more tractable than fusing the recursive loops, but I’m still not immediately certain how to do it. GHC would have to somehow deduce that `s` is existentially-bound, so it can rewrite something like SF (\a ((), x) -> ... Yield ((), y) b ...) ((), s) to SF (\a x -> ... Yield y b) s by parametricity. Is that an unreasonable ask? I don’t know! Another subtlety I considered involves recursive arrows, where I currently depend on laziness in (|||). Here’s one example: mapSF :: SF a b -> SF [a] [b] mapSF f = proc xs -> case xs of x:xs -> do y <- f -< x ys <- mapSF f -< xs returnA -< (y:ys) [] -> returnA -< [] Currently, GHC will just compile this to `mapSF f = mapSF f` under your implementation, since (|||) and (>>>) are both strict. However, I think this is not totally intractable—we can easily introduce an explicit `lazy` combinator to rein in strictness: lazy :: SF a b -> SF a b lazy sf0 = SF g (Unit sf0) where g a (Unit sf1) = case runSF sf1 a of (b, sf2) -> Yield (Unit sf2) b And now we can write `lazy (mapSF f)` at the point of the recursive call to avoid the infinite loop. This defeats some optimizations, of course, but `mapSF` is fundamentally recursive, so there’s only so much we can really expect. So perhaps my needs here are less ambitious, after all! Getting rid of all those redundant tuples is my next question, but that’s rather unrelated from what we’ve been talking about so far. Alexis

Sebastian Graf

1 Apr 1 Apr

8:21 a.m.

...

Looking at the optimized core, it’s true that the conversion of Maybe to Either and back again gets eliminated, which is wonderful! But what’s less wonderful is the value passed around through `s`:

mapMaybeSF = \ (@ a) (@ b) (f :: SF a b) -> case f of { SF @ s f2 s2 -> SF (\ (a1 :: Maybe a) (ds2 :: ((), ((), (((), (((), (((), s), ())), ((), ((), ())))), ((), ()))))) ->

That is indeed true. But note that as long as you manage to inline `mapMaybeSF`, the final `runSF` will only allocate once on the "edge" of each iteration, all intermediate allocations will have been fused away. But the allocation of these non-sense records seems unfortunate. Optimisation-wise, I see two problems here: 1. `mapMaybeSF` is already too huge to inline without INLINE. That is because its lambda isn't floated out to the top-level, which is because of the existential @s (that shouldn't be a problem), but also its mention of f2. The fact that f2 occurs free rather than as an argument makes the simplifier specialise `mapMaybeSF` for it, so if it were floated out (thereby necessarily lambda-lifted) to top-level, then we'd lose the ability to specialise without SpecConstr (which currently only applies to recursive functions anyway). 2. The lambda isn't let-bound (which is probably a consequence of the previous point), so it isn't strictness analysed and we have no W/W split. If we had, I imagine we would have a worker of type `s -> ...` here. W/W is unnecessary if we manage to inline the function anyway, but I'm pretty certain we won't inline for larger programs (like `mapMaybeSF` already), in which case every failure to inline leaves behind such a residue of records. So this already seems quite brittle. Maybe a very targeted optimisation that gets rid of the boring ((), _) wrappers could be worthwhile, given that a potential caller is never able to construct such a thing themselves. But that very much hinges on being able to prove that in fact every such ((), _) constructed in the function itself terminates. There are a few ways I can think of in which we as the programmer could have been smarter, though: - Simply by specialising `SF` for the `()` case: data SF a b where SFState :: !(a -> s -> Step s b) -> !s -> SF a b SFNoState :: !(a -> Step () b) -> SF a b And then implementing every action 2^n times, where n is the number of `SF` arguments. That undoubtly leads to even more code bloat. - An alternative that I'm a little uncertain would play out would be data SMaybe a = SNothing | SJust !a data SF a b where SF :: !(SMaybe (s :~: ()) -> !(a -> s -> Step s b) -> !s -> SF a b and try match on the proof everywhere needed to justify e.g. in `(.)` only storing e.g. s1 instead of (s1, s2). Basically do some type algebra in the implementation. - An even simpler thing would be to somehow use `Void#` (which should have been named `Unit#`), but I think that doesn't work due to runtime rep polymorphism restrictions. I think there is lots that can be done to tune this idea. Am Mi., 1. Apr. 2020 um 01:16 Uhr schrieb Alexis King

...

:

...

...
On Mar 31, 2020, at 17:05, Sebastian Graf wrote:

Yeah, SPEC is quite unreliable, because IIRC at some point it's either consumed or irrelevant. But none of the combinators you mentioned should rely on SpecConstr! They are all non-recursive, so the Simplifier will take care of "specialisation". And it works just fine, I just tried it

Ah! You are right, I did not read carefully enough and misinterpreted. That approach is clever, indeed. I had tried something similar with a CPS encoding, but the piece I was missing was using the existential to tie the final knot.

I have tried it out on some of my experiments. It’s definitely a significant improvement, but it isn’t perfect. Here’s a small example:

mapMaybeSF :: SF a b -> SF (Maybe a) (Maybe b) mapMaybeSF f = proc v -> case v of Just x -> do y <- f -< x returnA -< Just y Nothing -> returnA -< Nothing

Looking at the optimized core, it’s true that the conversion of Maybe to Either and back again gets eliminated, which is wonderful! But what’s less wonderful is the value passed around through `s`:

mapMaybeSF = \ (@ a) (@ b) (f :: SF a b) -> case f of { SF @ s f2 s2 -> SF (\ (a1 :: Maybe a) (ds2 :: ((), ((), (((), (((), (((), s), ())), ((), ((), ())))), ((), ()))))) ->

Yikes! GHC has no obvious way to clean this type up, so it will just grow indefinitely, and we end up doing a dozen pattern-matches in the body followed by another dozen allocations, just wrapping and unwrapping tuples.

Getting rid of that seems probably a lot more tractable than fusing the recursive loops, but I’m still not immediately certain how to do it. GHC would have to somehow deduce that `s` is existentially-bound, so it can rewrite something like

SF (\a ((), x) -> ... Yield ((), y) b ...) ((), s)

to

SF (\a x -> ... Yield y b) s

by parametricity. Is that an unreasonable ask? I don’t know!

Another subtlety I considered involves recursive arrows, where I currently depend on laziness in (|||). Here’s one example:

mapSF :: SF a b -> SF [a] [b] mapSF f = proc xs -> case xs of x:xs -> do y <- f -< x ys <- mapSF f -< xs returnA -< (y:ys) [] -> returnA -< []

Currently, GHC will just compile this to `mapSF f = mapSF f` under your implementation, since (|||) and (>>>) are both strict. However, I think this is not totally intractable—we can easily introduce an explicit `lazy` combinator to rein in strictness:

lazy :: SF a b -> SF a b lazy sf0 = SF g (Unit sf0) where g a (Unit sf1) = case runSF sf1 a of (b, sf2) -> Yield (Unit sf2) b

And now we can write `lazy (mapSF f)` at the point of the recursive call to avoid the infinite loop. This defeats some optimizations, of course, but `mapSF` is fundamentally recursive, so there’s only so much we can really expect.

So perhaps my needs here are less ambitious, after all! Getting rid of all those redundant tuples is my next question, but that’s rather unrelated from what we’ve been talking about so far.

Alexis

Alexis King

8:36 a.m.

...

On Apr 1, 2020, at 03:21, Sebastian Graf wrote:

That is indeed true. But note that as long as you manage to inline `mapMaybeSF`, the final `runSF` will only allocate once on the "edge" of each iteration, all intermediate allocations will have been fused away. But the allocation of these non-sense records seems unfortunate.

Yes, that is technically true, but note that even if we inline mapMaybeSF, those nonsense records don’t go away, they just bubble up to the “fringe” of the enclosing computation. And consider how tiny mapMaybeSF is: I shudder to think how enormous that “fringe” would be for a large program written in SF! (And of course, nothing prevents the runSF itself from appearing in a loop—quite probable, in fact, given its use in the hypothetical `lazy` combinator.)

...

So this already seems quite brittle. Maybe a very targeted optimisation that gets rid of the boring ((), _) wrappers could be worthwhile, given that a potential caller is never able to construct such a thing themselves. But that very much hinges on being able to prove that in fact every such ((), _) constructed in the function itself terminates.

Yes, that is a good point. I concede that seems much less tractable than I had initially hoped. Still, as you suggest, it does seem plausible that a different encoding could avoid this problem. I will experiment with a few different things and get back to you if I find anything interesting (assuming you don’t beat me to it first!).

Alexis King

4 Apr 4 Apr

1:45 a.m.

I fiddled with alternative representations for a while and didn’t make any progress—it was too easy to end up with code explosion in the presence of any unknown calls—but I seem to have found a RULES-based approach that works very well on the examples I’ve tried. It’s quite simple, which makes it especially appealing! I started by defining a wrapper around the `SF` constructor to attach rules to: mkSF :: (a -> s -> Step s b) -> s -> SF a b mkSF = SF {-# INLINE CONLIKE [1] mkSF #-} I then changed the definitions of (.), (***), (&&&), (+++), and (&&&) to use `mkSF` instead of `SF`, but I left the other methods alone, so they just use `SF` directly. Then I defined two rewrite rules: {-# RULES "mkSF @((), _)" forall f s. mkSF f ((), s) = SF (\a s1 -> case f a ((), s1) of Step ((), s2) b -> Step s2 b) s "mkSF @(_, ())" forall f s. mkSF f (s, ()) = SF (\a s1 -> case f a (s1, ()) of Step (s2, ()) b -> Step s2 b) s #-} That’s it. These two rules alone are enough to eliminate the redundant tupling. Now the optimized version of `mapMaybeSF` is beautiful! mapMaybeSF = \ @ a @ b f -> case f of { SF @ s f2 s2 -> SF (\ a1 s1 -> case a1 of { Nothing -> case s1 of dt { __DEFAULT -> Step dt Nothing } Just x -> case f2 x s1 of { Step s2' c1 -> Step s2' (Just c1) }}) s2 } So unless this breaks down in some larger situation I’m not aware of, I think this solves my problem without the need for any fancy SpecConstr shenanigans. Many thanks to you, Sebastian, for pointing me in the right direction! Alexis

Sebastian Graf

5 Apr 5 Apr

7 p.m.

...

That’s it. These two rules alone are enough to eliminate the redundant tupling. Now the optimized version of `mapMaybeSF` is beautiful!

Beautiful indeed! That's wonderful to hear. Good luck messing about with your FRP framework! Sebastian Am Sa., 4. Apr. 2020 um 03:45 Uhr schrieb Alexis King

...

:

...

I fiddled with alternative representations for a while and didn’t make any progress—it was too easy to end up with code explosion in the presence of any unknown calls—but I seem to have found a RULES-based approach that works very well on the examples I’ve tried. It’s quite simple, which makes it especially appealing!

I started by defining a wrapper around the `SF` constructor to attach rules to:

mkSF :: (a -> s -> Step s b) -> s -> SF a b mkSF = SF {-# INLINE CONLIKE [1] mkSF #-}

I then changed the definitions of (.), (***), (&&&), (+++), and (&&&) to use `mkSF` instead of `SF`, but I left the other methods alone, so they just use `SF` directly. Then I defined two rewrite rules:

{-# RULES "mkSF @((), _)" forall f s. mkSF f ((), s) = SF (\a s1 -> case f a ((), s1) of Step ((), s2) b -> Step s2 b) s "mkSF @(_, ())" forall f s. mkSF f (s, ()) = SF (\a s1 -> case f a (s1, ()) of Step (s2, ()) b -> Step s2 b) s #-}

That’s it. These two rules alone are enough to eliminate the redundant tupling. Now the optimized version of `mapMaybeSF` is beautiful!

mapMaybeSF = \ @ a @ b f -> case f of { SF @ s f2 s2 -> SF (\ a1 s1 -> case a1 of { Nothing -> case s1 of dt { __DEFAULT -> Step dt Nothing } Just x -> case f2 x s1 of { Step s2' c1 -> Step s2' (Just c1) }}) s2 }

So unless this breaks down in some larger situation I’m not aware of, I think this solves my problem without the need for any fancy SpecConstr shenanigans. Many thanks to you, Sebastian, for pointing me in the right direction!

Alexis

Simon Peyton Jones

6 Apr 6 Apr

9:53 p.m.

Cool -- but please do write a blog post or something to distil what you have learned. I have not followed this thread in detail, and I bet others haven't either. But it'd be a pity for your learning not to be shared somehow! Thanks Simon | -----Original Message----- | From: ghc-devs On Behalf Of Alexis King | Sent: 04 April 2020 02:46 | To: Sebastian Graf | Cc: ghc-devs | Subject: Re: Fusing loops by specializing on functions with SpecConstr? | | | I fiddled with alternative representations for a while and didn’t make any | progress—it was too easy to end up with code explosion in the presence of | any unknown calls—but I seem to have found a RULES-based approach that | works very well on the examples I’ve tried. It’s quite simple, which makes | it especially appealing! | | I started by defining a wrapper around the `SF` constructor to attach | rules to: | | mkSF :: (a -> s -> Step s b) -> s -> SF a b | mkSF = SF | {-# INLINE CONLIKE [1] mkSF #-} | | I then changed the definitions of (.), (***), (&&&), (+++), and (&&&) to | use `mkSF` instead of `SF`, but I left the other methods alone, so they | just use `SF` directly. Then I defined two rewrite rules: | | {-# RULES | "mkSF @((), _)" forall f s. mkSF f ((), s) = | SF (\a s1 -> case f a ((), s1) of Step ((), s2) b -> Step s2 b) s | "mkSF @(_, ())" forall f s. mkSF f (s, ()) = | SF (\a s1 -> case f a (s1, ()) of Step (s2, ()) b -> Step s2 b) s | #-} | | That’s it. These two rules alone are enough to eliminate the redundant | tupling. Now the optimized version of `mapMaybeSF` is beautiful! | | mapMaybeSF = \ @ a @ b f -> case f of { SF @ s f2 s2 -> | SF (\ a1 s1 -> case a1 of { | Nothing -> case s1 of dt { __DEFAULT -> Step dt Nothing } | Just x -> case f2 x s1 of { | Step s2' c1 -> Step s2' (Just c1) }}) | s2 } | | So unless this breaks down in some larger situation I’m not aware of, I | think this solves my problem without the need for any fancy SpecConstr | shenanigans. Many thanks to you, Sebastian, for pointing me in the right | direction! | | Alexis | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask | ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc- | devs&data=02%7C01%7Csimonpj%40microsoft.com%7Cfa33485e4b3643e695fe08d7 | d839ecb9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637215615608529596&a | mp;sdata=CSDPKcz%2BnVuQC%2BitP%2FZXpPpOtcTxUAfe0fxiNZAfTrs%3D&reserved | =0

Simon Peyton Jones

31 Mar 31 Mar

10:49 p.m.

Joachim: this conversation is triggering some hind-brain neurons related to exitification, or something like that. I recall that we discovered we could get some surprising fusion of recursive functions expressed as join points. Something like f . g . h where h loops for a while and returns, and same for g and f. Then the call to g landed up in the return branch of h, and same for f. But I can’t find anything in writing. The Exitify module doesn’t say much. I thought we had a wiki page but I can’t find it. Can you remember? Thanks Simon From: Alexis King Sent: 31 March 2020 22:18 To: Sebastian Graf ; Simon Peyton Jones Cc: ghc-devs Subject: Re: Fusing loops by specializing on functions with SpecConstr? Sebastian and Simon, Thank you both for your responses—they are all quite helpful! I agree with both of you that figuring out how to do this kind of specialization without any guidance from the programmer seems rather intractable. It’s too hard to divine where it would actually be beneficial, and even if you could, it seems likely that other optimizations would get in the way of it actually working out. I’ve been trying to figure out if it would be possible to help the optimizer out by annotating the program with special combinators like the existing ones provided by GHC.Magic. However, I haven’t been able to come up with anything yet that seems like it would actually work. On Mar 31, 2020, at 06:12, Simon Peyton Jones mailto:simonpj@microsoft.com> wrote: Wow – tricky stuff! I would never have thought of trying to optimise that program, but it’s fascinating that you get lots and lots of them from FRP. For context, the reason you get all these tiny loops is that arrowized FRP uses the Arrow and ArrowChoice interfaces to build its programs, and those interfaces use tiny combinator functions like these: first :: Arrow a => a b c -> a (b, d) (c, d) (***) :: Arrow a => a b d -> a c e -> a (b, c) (d, e) (|||) :: ArrowChoice a => a b d -> a c d -> a (Either b c) d This means you end up with programs built out of dozens or hundreds of uses of these tiny combinators. You get code that looks like first (left (arr f >>> g ||| right h) *** second i) and this is a textbook situation where you want to specialize and inline all the combinators! For arrows without this tricky recursion, doing that works as intended, and GHC’s simplifier will do what it’s supposed to, and you get fast code. But with FRP, each of these combinators is recursive. This means you often get really awful code that looks like this: arr (\case { Nothing -> Left (); Just x -> Right x }) >>> (f ||| g) This converts a Maybe to an Either, then branches on it. It’s analogous to writing something like this in direct-style code: let y = case x of { Nothing -> Left (); Just x -> Right x } in case y of { Left () -> f; Right x -> g x } We really want the optimizer to eliminate the intermediate Either and just branch on it directly, and if GHC could fuse these tiny recursive loops, it could! But without that, all this pointless shuffling of values around remains in the optimized program. * I wonder whether it’d be possible to adjust the FRP library to generate easier-to-optimise code. Probably not, but worth asking. I think it’s entirely possible to somehow annotate these combinators to communicate this information to the optimizer, but I don’t know what the annotations ought to look like. (That’s the research part!) But I’m not very optimistic about getting the library to generate easier-to-optimize code with the tools available today. Sebastian’s example of SF2 and stream fusion sort of works, but in my experience, something like that doesn’t handle enough cases well enough to work on real arrow programs. * Unrolling one layer of a recursive function. That seems harder: how we know to *stop* unrolling as we successively simplify? One idea: do one layer of unrolling by hand, perhaps even in FRP source code: add1rec = SF (\a -> let !b = a+1 in (b,add1rec)) add1 = SF (\a -> let !b = a+1 in (b,add1rec)) Yes, I was playing with the idea at one point of some kind of RULE that inserts GHC.Magic.inline on the specialized RHS. That way the programmer could ask for the unrolling explicitly, as otherwise it seems unreasonable to ask the compiler to figure it out. On Mar 31, 2020, at 08:08, Sebastian Graf mailto:sgraf1337@gmail.com> wrote: We can formulate SF as a classic Stream that needs an `a` to produce its next element of type `b` like this (SF2 below) This is a neat trick, though I’ve had trouble getting it to work reliably in my experiments (even though I was using GHC.Types.SPEC). That said, I also feel like I don’t understand the subtleties of SpecConstr very well, so it could have been my fault. The more fundamental problem I’ve found with that approach is that it doesn’t do very well for arrow combinators like (***) and (|||), which come up very often in arrow programs but rarely in streaming. Fusing long chains of first/second/left/right is actually pretty easy with ordinary RULEs, but (***) and (|||) are much harder, since they have multiple continuations. It seems at first appealing to rewrite `f *** g` into `first f >>> second g`, which solves the immediate problem, but this is actually a lot less efficient after repeated rewritings. You end up rewriting `(f ||| g) *** h` into `first (left f) >>> first (right g) >>> second h`, turning two distinct branches into four, and larger programs have much worse exponential blowups. So that’s where I’ve gotten stuck! I’ve been toying with the idea of thinking about expression “shells”, so if you have something like first (a ||| b) >>> c *** second (d ||| e) >>> f then you have a “shell” of the shape first (● ||| ●) >>> ● *** second (● ||| ●) >>> ● which theoretically serves as a key for the specialization. You can then generate a specialization and a rule: $s a b c d e f = ... {-# RULE forall a b c d e f. first (a ||| b) >>> c *** second (d ||| e) >>> f = $s a b c d e f #-} The question then becomes: how do you specify what these shells are, and how do you specify how to transform the shell into a specialized function? I don’t know, but it’s something a Core plugin could theoretically do. Maybe it makes sense for this domain-specific optimization to be a Core pass that runs before the simplifier, like the typeclass specializer currently is, but I haven’t explored that yet. Alexis

Joachim Breitner

1 Apr 1 Apr

6:37 p.m.

Hi, I think most of the docs about exitification are the notes in the Exitify module, and then there is the original ticket at https://gitlab.haskell.org/ghc/ghc/issues/14152 I don’t immediately see the connection to SpecConstr on function values, though, so I don't really know what’s tickling your neurons right now. Cheers, Joachim Am Dienstag, den 31.03.2020, 22:49 +0000 schrieb Simon Peyton Jones:

...

Joachim: this conversation is triggering some hind-brain neurons related to exitification, or something like that. I recall that we discovered we could get some surprising fusion of recursive functions expressed as join points. Something like f . g . h where h loops for a while and returns, and same for g and f. Then the call to g landed up in the return branch of h, and same for f.

But I can’t find anything in writing. The Exitify module doesn’t say much. I thought we had a wiki page but I can’t find it. Can you remember?

Thanks

Simon

From: Alexis King Sent: 31 March 2020 22:18 To: Sebastian Graf ; Simon Peyton Jones Cc: ghc-devs Subject: Re: Fusing loops by specializing on functions with SpecConstr?

Sebastian and Simon,

Thank you both for your responses—they are all quite helpful! I agree with both of you that figuring out how to do this kind of specialization without any guidance from the programmer seems rather intractable. It’s too hard to divine where it would actually be beneficial, and even if you could, it seems likely that other optimizations would get in the way of it actually working out.

I’ve been trying to figure out if it would be possible to help the optimizer out by annotating the program with special combinators like the existing ones provided by GHC.Magic. However, I haven’t been able to come up with anything yet that seems like it would actually work.

...
On Mar 31, 2020, at 06:12, Simon Peyton Jones wrote:

Wow – tricky stuff! I would never have thought of trying to optimise that program, but it’s fascinating that you get lots and lots of them from FRP.

For context, the reason you get all these tiny loops is that arrowized FRP uses the Arrow and ArrowChoice interfaces to build its programs, and those interfaces use tiny combinator functions like these:

first :: Arrow a => a b c -> a (b, d) (c, d) (***) :: Arrow a => a b d -> a c e -> a (b, c) (d, e) (|||) :: ArrowChoice a => a b d -> a c d -> a (Either b c) d

This means you end up with programs built out of dozens or hundreds of uses of these tiny combinators. You get code that looks like

first (left (arr f >>> g ||| right h) *** second i)

and this is a textbook situation where you want to specialize and inline all the combinators! For arrows without this tricky recursion, doing that works as intended, and GHC’s simplifier will do what it’s supposed to, and you get fast code.

But with FRP, each of these combinators is recursive. This means you often get really awful code that looks like this:

arr (\case { Nothing -> Left (); Just x -> Right x }) >>> (f ||| g)

This converts a Maybe to an Either, then branches on it. It’s analogous to writing something like this in direct-style code:

let y = case x of { Nothing -> Left (); Just x -> Right x } in case y of { Left () -> f; Right x -> g x }

We really want the optimizer to eliminate the intermediate Either and just branch on it directly, and if GHC could fuse these tiny recursive loops, it could! But without that, all this pointless shuffling of values around remains in the optimized program.

...
I wonder whether it’d be possible to adjust the FRP library to generate easier-to-optimise code. Probably not, but worth asking.

I think it’s entirely possible to somehow annotate these combinators to communicate this information to the optimizer, but I don’t know what the annotations ought to look like. (That’s the research part!)

But I’m not very optimistic about getting the library to generate easier-to-optimize code with the tools available today. Sebastian’s example of SF2 and stream fusion sort of works, but in my experience, something like that doesn’t handle enough cases well enough to work on real arrow programs.

...
Unrolling one layer of a recursive function. That seems harder: how we know to *stop* unrolling as we successively simplify? One idea: do one layer of unrolling by hand, perhaps even in FRP source code: add1rec = SF (\a -> let !b = a+1 in (b,add1rec)) add1 = SF (\a -> let !b = a+1 in (b,add1rec))

Yes, I was playing with the idea at one point of some kind of RULE that inserts GHC.Magic.inline on the specialized RHS. That way the programmer could ask for the unrolling explicitly, as otherwise it seems unreasonable to ask the compiler to figure it out.

...
On Mar 31, 2020, at 08:08, Sebastian Graf wrote:

We can formulate SF as a classic Stream that needs an `a` to produce its next element of type `b` like this (SF2 below)

This is a neat trick, though I’ve had trouble getting it to work reliably in my experiments (even though I was using GHC.Types.SPEC). That said, I also feel like I don’t understand the subtleties of SpecConstr very well, so it could have been my fault.

The more fundamental problem I’ve found with that approach is that it doesn’t do very well for arrow combinators like (***) and (|||), which come up very often in arrow programs but rarely in streaming. Fusing long chains of first/second/left/right is actually pretty easy with ordinary RULEs, but (***) and (|||) are much harder, since they have multiple continuations.

It seems at first appealing to rewrite `f *** g` into `first f >>> second g`, which solves the immediate problem, but this is actually a lot less efficient after repeated rewritings. You end up rewriting `(f ||| g) *** h` into `first (left f) >>> first (right g) >>> second h`, turning two distinct branches into four, and larger programs have much worse exponential blowups.

So that’s where I’ve gotten stuck! I’ve been toying with the idea of thinking about expression “shells”, so if you have something like

first (a ||| b) >>> c *** second (d ||| e) >>> f

then you have a “shell” of the shape

first (● ||| ●) >>> ● *** second (● ||| ●) >>> ●

which theoretically serves as a key for the specialization. You can then generate a specialization and a rule:

$s a b c d e f = ... {-# RULE forall a b c d e f. first (a ||| b) >>> c *** second (d ||| e) >>> f = $s a b c d e f #-}

The question then becomes: how do you specify what these shells are, and how do you specify how to transform the shell into a specialized function? I don’t know, but it’s something a Core plugin could theoretically do. Maybe it makes sense for this domain-specific optimization to be a Core pass that runs before the simplifier, like the typeclass specializer currently is, but I haven’t explored that yet.

Alexis -- Joachim Breitner mail@joachim-breitner.de http://www.joachim-breitner.de/

Simon Peyton Jones

8:10 p.m.

Thanks. Perhaps I was thinking of Section 5 of the join-point paper https://www.microsoft.com/en-us/research/publication/compiling-without-conti... That's about compositions of tiny tail recursive loops. Alexis, just conceivably this might be relevant to your thinking on FRP ... but I'm waving my arms here so might be wide of the mark. Simon | -----Original Message----- | From: ghc-devs On Behalf Of Joachim | Breitner | Sent: 01 April 2020 19:37 | To: ghc-devs | Subject: Re: Fusing loops by specializing on functions with SpecConstr? | | Hi, | | I think most of the docs about exitification are the notes in the Exitify | module, and then there is the original ticket at | https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.h | askell.org%2Fghc%2Fghc%2Fissues%2F14152&data=02%7C01%7Csimonpj%40micro | soft.com%7C8e5db80efc0f407af9c308d7d66bcd58%7C72f988bf86f141af91ab2d7cd011 | db47%7C1%7C0%7C637213630817542428&sdata=TBb6lzmIJvtHOQLwpsyFLPi2BEF%2B | B66piMGTgcV%2Bkls%3D&reserved=0 | | I don’t immediately see the connection to SpecConstr on function values, | though, so I don't really know what’s tickling your neurons right now. | | Cheers, | Joachim | | | Am Dienstag, den 31.03.2020, 22:49 +0000 schrieb Simon Peyton Jones: | > Joachim: this conversation is triggering some hind-brain neurons | > related to exitification, or something like that. I recall that we | > discovered we could get some surprising fusion of recursive functions | > expressed as join points. Something like f . g . h | > where h loops for a while and returns, and same for g and f. Then the | > call to g landed up in the return branch of h, and same for f. | > | > But I can’t find anything in writing. The Exitify module doesn’t say | much. I thought we had a wiki page but I can’t find it. Can you remember? | > | > Thanks | > | > Simon | > | > From: Alexis King | > Sent: 31 March 2020 22:18 | > To: Sebastian Graf ; Simon Peyton Jones | > | > Cc: ghc-devs | > Subject: Re: Fusing loops by specializing on functions with SpecConstr? | > | > Sebastian and Simon, | > | > Thank you both for your responses—they are all quite helpful! I agree | with both of you that figuring out how to do this kind of specialization | without any guidance from the programmer seems rather intractable. It’s | too hard to divine where it would actually be beneficial, and even if you | could, it seems likely that other optimizations would get in the way of it | actually working out. | > | > I’ve been trying to figure out if it would be possible to help the | optimizer out by annotating the program with special combinators like the | existing ones provided by GHC.Magic. However, I haven’t been able to come | up with anything yet that seems like it would actually work. | > | > > On Mar 31, 2020, at 06:12, Simon Peyton Jones | wrote: | > > | > > Wow – tricky stuff! I would never have thought of trying to optimise | that program, but it’s fascinating that you get lots and lots of them from | FRP. | > | > | > For context, the reason you get all these tiny loops is that arrowized | FRP uses the Arrow and ArrowChoice interfaces to build its programs, and | those interfaces use tiny combinator functions like these: | > | > first :: Arrow a => a b c -> a (b, d) (c, d) | > (***) :: Arrow a => a b d -> a c e -> a (b, c) (d, e) | > (|||) :: ArrowChoice a => a b d -> a c d -> a (Either b c) d | > | > This means you end up with programs built out of dozens or hundreds of | > uses of these tiny combinators. You get code that looks like | > | > first (left (arr f >>> g ||| right h) *** second i) | > | > and this is a textbook situation where you want to specialize and inline | all the combinators! For arrows without this tricky recursion, doing that | works as intended, and GHC’s simplifier will do what it’s supposed to, and | you get fast code. | > | > But with FRP, each of these combinators is recursive. This means you | often get really awful code that looks like this: | > | > arr (\case { Nothing -> Left (); Just x -> Right x }) >>> (f ||| | > g) | > | > This converts a Maybe to an Either, then branches on it. It’s analogous | to writing something like this in direct-style code: | > | > let y = case x of { Nothing -> Left (); Just x -> Right x } | > in case y of { Left () -> f; Right x -> g x } | > | > We really want the optimizer to eliminate the intermediate Either and | just branch on it directly, and if GHC could fuse these tiny recursive | loops, it could! But without that, all this pointless shuffling of values | around remains in the optimized program. | > | > | > > I wonder whether it’d be possible to adjust the FRP library to | generate easier-to-optimise code. Probably not, but worth asking. | > | > | > I think it’s entirely possible to somehow annotate these combinators | > to communicate this information to the optimizer, but I don’t know | > what the annotations ought to look like. (That’s the research part!) | > | > But I’m not very optimistic about getting the library to generate | easier-to-optimize code with the tools available today. Sebastian’s | example of SF2 and stream fusion sort of works, but in my experience, | something like that doesn’t handle enough cases well enough to work on | real arrow programs. | > | > > Unrolling one layer of a recursive function. That seems harder: how | we know to *stop* unrolling as we successively simplify? One idea: do one | layer of unrolling by hand, perhaps even in FRP source code: | > > add1rec = SF (\a -> let !b = a+1 in (b,add1rec)) | > > add1 = SF (\a -> let !b = a+1 in (b,add1rec)) | > | > | > Yes, I was playing with the idea at one point of some kind of RULE that | inserts GHC.Magic.inline on the specialized RHS. That way the programmer | could ask for the unrolling explicitly, as otherwise it seems unreasonable | to ask the compiler to figure it out. | > | > > On Mar 31, 2020, at 08:08, Sebastian Graf wrote: | > > | > > We can formulate SF as a classic Stream that needs an `a` to produce | > > its next element of type `b` like this (SF2 below) | > | > | > This is a neat trick, though I’ve had trouble getting it to work | reliably in my experiments (even though I was using GHC.Types.SPEC). That | said, I also feel like I don’t understand the subtleties of SpecConstr | very well, so it could have been my fault. | > | > The more fundamental problem I’ve found with that approach is that it | doesn’t do very well for arrow combinators like (***) and (|||), which | come up very often in arrow programs but rarely in streaming. Fusing long | chains of first/second/left/right is actually pretty easy with ordinary | RULEs, but (***) and (|||) are much harder, since they have multiple | continuations. | > | > It seems at first appealing to rewrite `f *** g` into `first f >>> | second g`, which solves the immediate problem, but this is actually a lot | less efficient after repeated rewritings. You end up rewriting `(f ||| g) | *** h` into `first (left f) >>> first (right g) >>> second h`, turning two | distinct branches into four, and larger programs have much worse | exponential blowups. | > | > So that’s where I’ve gotten stuck! I’ve been toying with the idea of | > thinking about expression “shells”, so if you have something like | > | > first (a ||| b) >>> c *** second (d ||| e) >>> f | > | > then you have a “shell” of the shape | > | > first (● ||| ●) >>> ● *** second (● ||| ●) >>> ● | > | > which theoretically serves as a key for the specialization. You can then | generate a specialization and a rule: | > | > $s a b c d e f = ... | > {-# RULE forall a b c d e f. | > first (a ||| b) >>> c *** second (d ||| e) >>> f = $s a b | > c d e f #-} | > | > The question then becomes: how do you specify what these shells are, and | how do you specify how to transform the shell into a specialized function? | I don’t know, but it’s something a Core plugin could theoretically do. | Maybe it makes sense for this domain-specific optimization to be a Core | pass that runs before the simplifier, like the typeclass specializer | currently is, but I haven’t explored that yet. | > | > Alexis | -- | Joachim Breitner | mail@joachim-breitner.de | | https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.joach | im- | breitner.de%2F&data=02%7C01%7Csimonpj%40microsoft.com%7C8e5db80efc0f40 | 7af9c308d7d66bcd58%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6372136308 | 17542428&sdata=%2B7hEhV9tXg7UmRrW6UaWH471SnLPLpWjj6XJLfNifsM%3D&re | served=0 | | | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask | ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc- | devs&data=02%7C01%7Csimonpj%40microsoft.com%7C8e5db80efc0f407af9c308d7 | d66bcd58%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637213630817542428&a | mp;sdata=rb23g0%2Bmo%2FQ1dld5ecqCHQcJZ0hb7ms9VMP%2B5nTpAk4%3D&reserved | =0

Simon Peyton Jones

8:21 p.m.

I have started a wiki page for join points here https://gitlab.haskell.org/ghc/ghc/-/wikis/Join-points-in-GHC Do add to it Simon | -----Original Message----- | From: ghc-devs On Behalf Of Joachim | Breitner | Sent: 01 April 2020 19:37 | To: ghc-devs | Subject: Re: Fusing loops by specializing on functions with SpecConstr? | | Hi, | | I think most of the docs about exitification are the notes in the Exitify | module, and then there is the original ticket at | https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.h | askell.org%2Fghc%2Fghc%2Fissues%2F14152&data=02%7C01%7Csimonpj%40micro | soft.com%7C8e5db80efc0f407af9c308d7d66bcd58%7C72f988bf86f141af91ab2d7cd011 | db47%7C1%7C0%7C637213630817542428&sdata=TBb6lzmIJvtHOQLwpsyFLPi2BEF%2B | B66piMGTgcV%2Bkls%3D&reserved=0 | | I don’t immediately see the connection to SpecConstr on function values, | though, so I don't really know what’s tickling your neurons right now. | | Cheers, | Joachim | | | Am Dienstag, den 31.03.2020, 22:49 +0000 schrieb Simon Peyton Jones: | > Joachim: this conversation is triggering some hind-brain neurons | > related to exitification, or something like that. I recall that we | > discovered we could get some surprising fusion of recursive functions | > expressed as join points. Something like f . g . h | > where h loops for a while and returns, and same for g and f. Then the | > call to g landed up in the return branch of h, and same for f. | > | > But I can’t find anything in writing. The Exitify module doesn’t say | much. I thought we had a wiki page but I can’t find it. Can you remember? | > | > Thanks | > | > Simon | > | > From: Alexis King | > Sent: 31 March 2020 22:18 | > To: Sebastian Graf ; Simon Peyton Jones | > | > Cc: ghc-devs | > Subject: Re: Fusing loops by specializing on functions with SpecConstr? | > | > Sebastian and Simon, | > | > Thank you both for your responses—they are all quite helpful! I agree | with both of you that figuring out how to do this kind of specialization | without any guidance from the programmer seems rather intractable. It’s | too hard to divine where it would actually be beneficial, and even if you | could, it seems likely that other optimizations would get in the way of it | actually working out. | > | > I’ve been trying to figure out if it would be possible to help the | optimizer out by annotating the program with special combinators like the | existing ones provided by GHC.Magic. However, I haven’t been able to come | up with anything yet that seems like it would actually work. | > | > > On Mar 31, 2020, at 06:12, Simon Peyton Jones | wrote: | > > | > > Wow – tricky stuff! I would never have thought of trying to optimise | that program, but it’s fascinating that you get lots and lots of them from | FRP. | > | > | > For context, the reason you get all these tiny loops is that arrowized | FRP uses the Arrow and ArrowChoice interfaces to build its programs, and | those interfaces use tiny combinator functions like these: | > | > first :: Arrow a => a b c -> a (b, d) (c, d) | > (***) :: Arrow a => a b d -> a c e -> a (b, c) (d, e) | > (|||) :: ArrowChoice a => a b d -> a c d -> a (Either b c) d | > | > This means you end up with programs built out of dozens or hundreds of | > uses of these tiny combinators. You get code that looks like | > | > first (left (arr f >>> g ||| right h) *** second i) | > | > and this is a textbook situation where you want to specialize and inline | all the combinators! For arrows without this tricky recursion, doing that | works as intended, and GHC’s simplifier will do what it’s supposed to, and | you get fast code. | > | > But with FRP, each of these combinators is recursive. This means you | often get really awful code that looks like this: | > | > arr (\case { Nothing -> Left (); Just x -> Right x }) >>> (f ||| | > g) | > | > This converts a Maybe to an Either, then branches on it. It’s analogous | to writing something like this in direct-style code: | > | > let y = case x of { Nothing -> Left (); Just x -> Right x } | > in case y of { Left () -> f; Right x -> g x } | > | > We really want the optimizer to eliminate the intermediate Either and | just branch on it directly, and if GHC could fuse these tiny recursive | loops, it could! But without that, all this pointless shuffling of values | around remains in the optimized program. | > | > | > > I wonder whether it’d be possible to adjust the FRP library to | generate easier-to-optimise code. Probably not, but worth asking. | > | > | > I think it’s entirely possible to somehow annotate these combinators | > to communicate this information to the optimizer, but I don’t know | > what the annotations ought to look like. (That’s the research part!) | > | > But I’m not very optimistic about getting the library to generate | easier-to-optimize code with the tools available today. Sebastian’s | example of SF2 and stream fusion sort of works, but in my experience, | something like that doesn’t handle enough cases well enough to work on | real arrow programs. | > | > > Unrolling one layer of a recursive function. That seems harder: how | we know to *stop* unrolling as we successively simplify? One idea: do one | layer of unrolling by hand, perhaps even in FRP source code: | > > add1rec = SF (\a -> let !b = a+1 in (b,add1rec)) | > > add1 = SF (\a -> let !b = a+1 in (b,add1rec)) | > | > | > Yes, I was playing with the idea at one point of some kind of RULE that | inserts GHC.Magic.inline on the specialized RHS. That way the programmer | could ask for the unrolling explicitly, as otherwise it seems unreasonable | to ask the compiler to figure it out. | > | > > On Mar 31, 2020, at 08:08, Sebastian Graf wrote: | > > | > > We can formulate SF as a classic Stream that needs an `a` to produce | > > its next element of type `b` like this (SF2 below) | > | > | > This is a neat trick, though I’ve had trouble getting it to work | reliably in my experiments (even though I was using GHC.Types.SPEC). That | said, I also feel like I don’t understand the subtleties of SpecConstr | very well, so it could have been my fault. | > | > The more fundamental problem I’ve found with that approach is that it | doesn’t do very well for arrow combinators like (***) and (|||), which | come up very often in arrow programs but rarely in streaming. Fusing long | chains of first/second/left/right is actually pretty easy with ordinary | RULEs, but (***) and (|||) are much harder, since they have multiple | continuations. | > | > It seems at first appealing to rewrite `f *** g` into `first f >>> | second g`, which solves the immediate problem, but this is actually a lot | less efficient after repeated rewritings. You end up rewriting `(f ||| g) | *** h` into `first (left f) >>> first (right g) >>> second h`, turning two | distinct branches into four, and larger programs have much worse | exponential blowups. | > | > So that’s where I’ve gotten stuck! I’ve been toying with the idea of | > thinking about expression “shells”, so if you have something like | > | > first (a ||| b) >>> c *** second (d ||| e) >>> f | > | > then you have a “shell” of the shape | > | > first (● ||| ●) >>> ● *** second (● ||| ●) >>> ● | > | > which theoretically serves as a key for the specialization. You can then | generate a specialization and a rule: | > | > $s a b c d e f = ... | > {-# RULE forall a b c d e f. | > first (a ||| b) >>> c *** second (d ||| e) >>> f = $s a b | > c d e f #-} | > | > The question then becomes: how do you specify what these shells are, and | how do you specify how to transform the shell into a specialized function? | I don’t know, but it’s something a Core plugin could theoretically do. | Maybe it makes sense for this domain-specific optimization to be a Core | pass that runs before the simplifier, like the typeclass specializer | currently is, but I haven’t explored that yet. | > | > Alexis | -- | Joachim Breitner | mail@joachim-breitner.de | | https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.joach | im- | breitner.de%2F&data=02%7C01%7Csimonpj%40microsoft.com%7C8e5db80efc0f40 | 7af9c308d7d66bcd58%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6372136308 | 17542428&sdata=%2B7hEhV9tXg7UmRrW6UaWH471SnLPLpWjj6XJLfNifsM%3D&re | served=0 | | | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask | ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc- | devs&data=02%7C01%7Csimonpj%40microsoft.com%7C8e5db80efc0f407af9c308d7 | d66bcd58%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637213630817542428&a | mp;sdata=rb23g0%2Bmo%2FQ1dld5ecqCHQcJZ0hb7ms9VMP%2B5nTpAk4%3D&reserved | =0

Harendra Kumar

8:20 a.m.

On Wed, 1 Apr 2020 at 02:49, Alexis King wrote:

...

I’ve been trying to figure out if it would be possible to help the optimizer out by annotating the program with special combinators like the existing ones provided by GHC.Magic. However, I haven’t been able to come up with anything yet that seems like it would actually work.

You may want to take a look at https://github.com/composewell/fusion-plugin which uses annotations to help GHC fuse, not specifically what you want but might possibly be relevant to your work. https://github.com/composewell/streamly relies heavily on case-of-case and SpecConstr for stream fusion. There are several cases that GHC is unable to fuse currently. We use a "Fuse" annotation to tell GHC that any function involving this type must be inlined so that fusion can occur reliably. With the help of fusion-plugin we have been able to fuse almost every known case in streamly till now. -harendra

1911

Age (days ago)

1919

Last active (days ago)

List overview

Download

16 comments

5 participants

participants (5)

Alexis King
Harendra Kumar
Joachim Breitner
Sebastian Graf
Simon Peyton Jones

Fusing loops by specializing on functions with SpecConstr?

tags

participants (5)