Re: Fusing loops by specializing on functions with SpecConstr?

1 Apr 2020

      ...
Looking at the optimized core, it’s true that the conversion of Maybe to
Either and back again gets eliminated, which is wonderful! But what’s less
wonderful is the value passed around through `s`:
mapMaybeSF
      = \ (@ a) (@ b) (f :: SF a b) ->
          case f of { SF @ s f2 s2 ->
          SF
            (\ (a1 :: Maybe a) (ds2 :: ((), ((), (((), (((), (((), s),
())), ((), ((), ())))), ((), ()))))) ->
That is indeed true. But note that as long as you manage to inline
`mapMaybeSF`, the final `runSF` will only allocate once on the "edge" of
each iteration, all intermediate allocations will have been fused away. But
the allocation of these non-sense records seems unfortunate.

Optimisation-wise, I see two problems here:

   1. `mapMaybeSF` is already too huge to inline without INLINE. That is
   because its lambda isn't floated out to the top-level, which is because of
   the existential @s (that shouldn't be a problem), but also its mention of
   f2. The fact that f2 occurs free rather than as an argument makes the
   simplifier specialise `mapMaybeSF` for it, so if it were floated out
   (thereby necessarily lambda-lifted) to top-level, then we'd lose the
   ability to specialise without SpecConstr (which currently only applies to
   recursive functions anyway).
   2. The lambda isn't let-bound (which is probably a consequence of the
   previous point), so it isn't strictness analysed and we have no W/W split.
   If we had, I imagine we would have a worker of type `s -> ...` here. W/W is
   unnecessary if we manage to inline the function anyway, but I'm pretty
   certain we won't inline for larger programs (like `mapMaybeSF` already), in
   which case every failure to inline leaves behind such a residue of records.

So this already seems quite brittle. Maybe a very targeted optimisation
that gets rid of the boring ((), _) wrappers could be worthwhile, given
that a potential caller is never able to construct such a thing themselves.
But that very much hinges on being able to prove that in fact every such
((), _) constructed in the function itself terminates.

There are a few ways I can think of in which we as the programmer could
have been smarter, though:

   - Simply by specialising `SF` for the `()` case:

   data SF a b where
     SFState :: !(a -> s -> Step s b) -> !s -> SF a b
     SFNoState :: !(a -> Step () b) -> SF a b

   And then implementing every action 2^n times, where n is the number of
   `SF` arguments. That undoubtly leads to even more code bloat.
   - An alternative that I'm a little uncertain would play out would be

   data SMaybe a = SNothing | SJust !a
   data SF a b where
     SF :: !(SMaybe (s :~: ()) ->  !(a -> s -> Step s b) -> !s -> SF a b

   and try match on the proof everywhere needed to justify e.g. in `(.)`
   only storing e.g. s1 instead of (s1, s2). Basically do some type algebra in
   the implementation.
   - An even simpler thing would be to somehow use `Void#` (which should
   have been named `Unit#`), but I think that doesn't work due to runtime rep
   polymorphism restrictions.

I think there is lots that can be done to tune this idea.

Am Mi., 1. Apr. 2020 um 01:16 Uhr schrieb Alexis King 
...
:
...
...
On Mar 31, 2020, at 17:05, Sebastian Graf  wrote:
Yeah, SPEC is quite unreliable, because IIRC at some point it's either
consumed or irrelevant. But none of the combinators you mentioned should
rely on SpecConstr! They are all non-recursive, so the Simplifier will take
care of "specialisation". And it works just fine, I just tried it
Ah! You are right, I did not read carefully enough and misinterpreted.
That approach is clever, indeed. I had tried something similar with a CPS
encoding, but the piece I was missing was using the existential to tie the
final knot.
I have tried it out on some of my experiments. It’s definitely a
significant improvement, but it isn’t perfect. Here’s a small example:
mapMaybeSF :: SF a b -> SF (Maybe a) (Maybe b)
    mapMaybeSF f = proc v -> case v of
      Just x -> do
        y <- f -< x
        returnA -< Just y
      Nothing -> returnA -< Nothing
Looking at the optimized core, it’s true that the conversion of Maybe to
Either and back again gets eliminated, which is wonderful! But what’s less
wonderful is the value passed around through `s`:
mapMaybeSF
      = \ (@ a) (@ b) (f :: SF a b) ->
          case f of { SF @ s f2 s2 ->
          SF
            (\ (a1 :: Maybe a) (ds2 :: ((), ((), (((), (((), (((), s),
())), ((), ((), ())))), ((), ()))))) ->
Yikes! GHC has no obvious way to clean this type up, so it will just grow
indefinitely, and we end up doing a dozen pattern-matches in the body
followed by another dozen allocations, just wrapping and unwrapping tuples.
Getting rid of that seems probably a lot more tractable than fusing the
recursive loops, but I’m still not immediately certain how to do it. GHC
would have to somehow deduce that `s` is existentially-bound, so it can
rewrite something like
SF (\a ((), x) -> ... Yield ((), y) b ...) ((), s)
to
SF (\a x -> ... Yield y b) s
by parametricity. Is that an unreasonable ask? I don’t know!
Another subtlety I considered involves recursive arrows, where I currently
depend on laziness in (|||). Here’s one example:
mapSF :: SF a b -> SF [a] [b]
    mapSF f = proc xs -> case xs of
      x:xs -> do
        y <- f -< x
        ys <- mapSF f -< xs
        returnA -< (y:ys)
      [] -> returnA -< []
Currently, GHC will just compile this to `mapSF f = mapSF f` under your
implementation, since (|||) and (>>>) are both strict. However, I think
this is not totally intractable—we can easily introduce an explicit `lazy`
combinator to rein in strictness:
lazy :: SF a b -> SF a b
    lazy sf0 = SF g (Unit sf0) where
      g a (Unit sf1) = case runSF sf1 a of
        (b, sf2) -> Yield (Unit sf2) b
And now we can write `lazy (mapSF f)` at the point of the recursive call
to avoid the infinite loop. This defeats some optimizations, of course, but
`mapSF` is fundamentally recursive, so there’s only so much we can really
expect.
So perhaps my needs here are less ambitious, after all! Getting rid of all
those redundant tuples is my next question, but that’s rather unrelated
from what we’ve been talking about so far.
Alexis