
#9476: Implement late lambda-lifting -------------------------------------+------------------------------------- Reporter: simonpj | Owner: nfrisby Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #8763 | Differential Rev(s): Wiki Page: LateLamLift | -------------------------------------+------------------------------------- Comment (by sgraf): Replying to [comment:16 simonpj]:
Thoughts
* There are a handful of spectacular reductions in allocation (queens, n-body). It'd be good to understand and explain them. Perhaps we can more closely target LLF on those cases.
It's hard to tell for `n-body`: The `-fno-llf` variant shows the same
reduction in allocations, meaning that the reduction must happen somewhere
in the base libraries rather than in `n-body` itself.
`queens` has its `go1` function within `gen` lifted (arity 1 with 3 free
vars). Non-lifted Core after CorePrep:
{{{
let {
n_s5S3 [Occ=OnceL*] :: [[GHC.Types.Int]]
[LclId]
n_s5S3 = go_s5RY ys_s5S2 } in
case GHC.Prim.># 1# ww1_s5RX of {
__DEFAULT ->
letrec {
go1_s5S5 [Occ=LoopBreaker]
:: GHC.Prim.Int# -> [[GHC.Types.Int]]
[LclId, Arity=1, Str=, Unf=OtherCon []]
go1_s5S5
= \ (x1_s5S6 :: GHC.Prim.Int#) ->
case Main.main_$ssafe y_s5S1 1# x1_s5S6 of {
GHC.Types.False ->
case GHC.Prim.==# x1_s5S6 ww1_s5RX of {
__DEFAULT ->
case GHC.Prim.+# x1_s5S6 1#
of sat_s5S9 [Occ=Once]
{ __DEFAULT ->
go1_s5S5 sat_s5S9
};
1# -> n_s5S3
};
GHC.Types.True ->
let {
sat_s5Se [Occ=Once] :: [[GHC.Types.Int]]
[LclId]
sat_s5Se
= case GHC.Prim.==# x1_s5S6 ww1_s5RX
of {
__DEFAULT ->
case GHC.Prim.+# x1_s5S6 1#
of sat_s5Sd [Occ=Once]
{ __DEFAULT ->
go1_s5S5 sat_s5Sd
};
1# -> n_s5S3
} } in
let {
sat_s5Sa [Occ=Once] :: GHC.Types.Int
[LclId]
sat_s5Sa = GHC.Types.I# x1_s5S6 } in
let {
sat_s5Sb [Occ=Once] :: [GHC.Types.Int]
[LclId]
sat_s5Sb
= GHC.Types.:
@ GHC.Types.Int sat_s5Sa y_s5S1 } in
GHC.Types.:
@ [GHC.Types.Int] sat_s5Sb sat_s5Se
}; } in
go1_s5S5 1#;
1# -> n_s5S3
}
}}}
And when `go1` is lifted to top-level, the CorePrep'd call site changes to
{{{
case GHC.Prim.># 1# ww1_s5Ts of {
__DEFAULT ->
let {
sat_s5Tz [Occ=Once, Dmd=
* I don't think we should float join points at all, recursive or non- recursive. Think of them like labels in a control-flow graph.
* I think of LLF as a code-generation strategy, that we do once all other transformations are done. (Lambda-lifting ''can'' affect earlier optimisations. It can make a big function into a small one (by floating out its guts), and thereby let it be inlined. But that is subtle and difficult to get consistent gains for. Let's not complicate LLF by
* Given that it's a code-gen strategy, doing it on STG makes perfect sense to me. You've outlined the pros and cons well. Definitely worth a
This was also what I thought, but there can be synergetic effects with the inliner if we manage to "outline" (that's what the transformation would be called in an imperative language) huge non-recursive join points, where the call overhead is neglibigle. At least that's what the wiki page mentions. But that brings me to the next point... thinking about this.) Yes, I wasn't sure if having it play nice with the simplifier was intended here. I take comfort in your confirmation seeing it as a code-gen strategy. try. Will do.
I'm not sure what you meant by "It's not enough to look at Core alone to gauge allocation" as a disadvantage.
Maybe it's just me only slowly understanding every layer of compilation in GHC, but I felt like I could have this mental model of roughly where and how much things allocate after CorePrep, but that's probably an illusion considering things like let-no-escapes (that story got better with join points, I suppose). Having another pass transforming heap allocation into stack allocation *after* CorePrep isn't exactly helping that intuition. Or were you thrown off by me using words I maybe mistranslated ("gauge" for "estimate", roughly)?
When you say "Much less involved analysis that doesn't need to stay in sync with CorePrep", I think it would be v helpful to lay out "the analysis". I have vague memories, but I don't know what this "stay in sync" stuff is about.
There is [https://github.com/sgraf812/ghc/blob/f5c160c98830fdba83faa9a0634eeab38dbe04d... this note] that explains why it's necessary to approximate CorePrep. What follows is [https://github.com/sgraf812/ghc/blob/f5c160c98830fdba83faa9a0634eeab38dbe04d... this new analysis] that interleaves a free variable analysis with computing use information for id's which are free in the floating bindings.
If you do it in STG you don't need to explain "stay in sync", but explaining the analysis would be excellent.
Yes! I'm not really sure I understand all of it yet. (Re-)implementing it on STG should help me figure things out. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9476#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler