
Thanks. Reading what you write below, I can see two possible motivations. 1. Reduce stack sizes. 2. Eliminate memory moves For (1) do we have any data to show that the non-overlap of areas was giving rise to unacceptably big stacks? For (2) that is indeed clever, but it's pretty serendipitous: it relies on the overlap being just so, so that coincidentally y gets stored in the same place as it was loaded from. I imagine that you don't plan the stack layout to cause that to happen; it's just a coincidence. Do we have any data to show that the coincidence happens with any frequency? Also, as you note, we lose the opportunity for certain sorts of code motion, perhaps increasing register pressure a lot. So there is a downside too. You seldom do things without a very good reason, so I feel I must be missing something. Simon | -----Original Message----- | From: Simon Marlow [mailto:marlowsd@gmail.com] | Sent: 10 January 2014 17:00 | To: Simon Peyton Jones; Herbert Valerio Riedel | Cc: ghc-devs@haskell.org | Subject: Re: High-level Cmm code and stack allocation | | So stack areas are still a great abstraction, the only change is that | they now overlap. It's not just about stack getting too big, I've | copied the notes I made about it below (which I will paste into the code | in due course). The nice property that we can generate well-defined Cmm | without knowing explicit stack offsets is intact. | | What is different is that there used to be an intermediate state where | live variables were saved to abstract stack areas across calls, but Sp | was still not manifest. This intermediate state doesn't exist any more, | the stack layout algorithm does it all in one pass. To me this was far | simpler, and I think it ended up being fewer lines of code than the old | multi-phase stack layout algorithm (it's also much faster). | | Of course you can always change this. My goal was to get code that was | at least as good as the old code generator and in a reasonable amount of | time, and this was the shortest path I could find to that goal. | | Cheers, | Simon | | e.g. if we had | | x = Sp[old + 8] | y = Sp[old + 16] | | Sp[young(L) + 8] = L | Sp[young(L) + 16] = y | Sp[young(L) + 24] = x | call f() returns to L | | if areas semantically do not overlap, then we might optimise this to | | Sp[young(L) + 8] = L | Sp[young(L) + 16] = Sp[old + 8] | Sp[young(L) + 24] = Sp[old + 16] | call f() returns to L | | and now young(L) cannot be allocated at the same place as old, and we | are doomed to use more stack. | | - old+8 conflicts with young(L)+8 | - old+16 conflicts with young(L)+16 and young(L)+8 | | so young(L)+8 == old+24 and we get | | Sp[-8] = L | Sp[-16] = Sp[8] | Sp[-24] = Sp[0] | Sp -= 24 | call f() returns to L | | However, if areas are defined to be "possibly overlapping" in the | semantics, then we cannot commute any loads/stores of old with young(L), | and we will be able to re-use both old+8 and old+16 for young(L). | | x = Sp[8] | y = Sp[0] | | Sp[8] = L | Sp[0] = y | Sp[-8] = x | Sp = Sp - 8 | call f() returns to L | | Now, the assignments of y go away, | | x = Sp[8] | Sp[8] = L | Sp[-8] = x | Sp = Sp - 8 | call f() returns to L | | | Conclusion: | | - T[old+N] aliases with U[young(L)+M] for all T, U, L, N and M | - T[old+N] aliases with U[old+M] only if the areas actually overlap | | this ensures that we will not commute any accesses to old with | young(L) or young(L) with young(L'), and the stack allocator will get | the maximum opportunity to overlap these areas, keeping the stack use to | a minimum and possibly avoiding some assignments. | | | | On 10/01/2014 16:35, Simon Peyton Jones wrote: | > Oh, ok. Alas, a good chunk of my model of Cmm has just gone out of | the window. I thought that areas were such a lovely, well-behaved | abstraction. I was thrilled when we came up with them, and I'm very | sorry to see them go. | > | > There are no many things that I no longer understand. I now have no | idea how we save live variables over a call, or how multiple returned | values from one call (returned on the stack) stay right where they are | if they are live across the next call. | > | > What was the actual problem? That functions used too much stack, so | the stack was getting too big? But a one slot area corresponds exactly | to a live variable, so I don't see how the area abstraction could | possibly increase stack size. And is stack size a crucial issue anyway? | > | > Apart from anything else, areas would have given a lovely solution to | the problem this thread started with! | > | > I guess we can talk about this when you next visit? But some | documentation would be welcome. | > | > Simon | > | > | -----Original Message----- | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | Sent: 10 January 2014 16:24 | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | Cc: ghc-devs@haskell.org | > | Subject: Re: High-level Cmm code and stack allocation | > | | > | There are no one-slot areas any more, I ditched those when I rewrote | > | the stack allocator. There is only ever one live area: either the | > | old area or the young area for a call we are about to make or have | just made. | > | (see the data type: I removed the one-slot areas) | > | | > | I struggled for a long time with this. The problem is that with the | > | semantics of non-overlapping areas, code motion optimisations would | > | tend to increase the stack requirements of the function by | > | overlapping the live ranges of the areas. I concluded that actually | > | what we wanted was areas that really do overlap, and optimisations | > | that respect that, so that we get more efficient stack usage. | > | | > | Cheers, | > | Simon | > | | > | On 10/01/2014 15:22, Simon Peyton Jones wrote: | > | > That documentation would be good, yes! I don't know what it means | > | > to | > | say "we don't really have a general concept of areas any more". We | > | did before, and I didn't know that it had gone away. Urk! We can | > | have lots of live areas, notably the old area (for the current | > | call/return parameters, the call area for a call we are preparing, | > | and the one-slot areas for variables we are saving on the stack. | > | > | > | > Here's he current story | > | > https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAre | > | > as | > | > | > | > I agree that we have no concrete syntax for talking about areas, | > | > but | > | that is something we could fix. But I'm worried that they may not | > | mean what they used to mean. | > | > | > | > Simon | > | > | > | > | -----Original Message----- | > | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | > | Sent: 09 January 2014 08:39 | > | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | > | Cc: ghc-devs@haskell.org | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | | > | > | On 08/01/2014 10:07, Simon Peyton Jones wrote: | > | > | > | > Can't we just allocate a Cmm "area"? The address of an | > | > | > | > area is a | > | > | > | perfectly well-defined Cmm value. | > | > | > | > | > | > What about this idea? | > | > | | > | > | We don't really have a general concept of areas (any more), and | > | > | areas aren't exposed in the concrete Cmm syntax at all. The | > | > | current semantics is that areas may overlap with each other, so | > | > | there should only be one active area at any point. I found that | > | > | this was important to ensure that we could generate good code | > | > | from the stack layout algorithm, otherwise it had to make | > | > | pessimistic assumptions | > | and use too much stack. | > | > | | > | > | You're going to ask me where this is documented, and I think I | > | > | have to admit to slacking off, sorry :-) We did discuss it at | > | > | the time, and I made copious notes, but I didn't transfer those | to the code. | > | > | I'll add a Note. | > | > | | > | > | Cheers, | > | > | Simon | > | > | | > | > | | > | > | > Simon | > | > | > | > | > | > | -----Original Message----- | > | > | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | > | > | Sent: 08 January 2014 09:26 | > | > | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | > | > | Cc: ghc-devs@haskell.org | > | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | > | | > | > | > | On 07/01/14 22:53, Simon Peyton Jones wrote: | > | > | > | > | Yes, this is technically wrong but luckily works. I'd | > | > | > | > | very much like to have a better solution, preferably one | > | > | > | > | that doesn't add any extra overhead. | > | > | > | > | > | > | > | > | __decodeFloat_Int is a C function, so it will not touch | > | > | > | > | the Haskell stack. | > | > | > | > | > | > | > | > This all seems terribly fragile to me. At least it ought | > | > | > | > to be | > | > | > | surrounded with massive comments pointing out how terribly | > | > | > | fragile it is, breaking all the rules that we carefully | > | > | > | document | > | elsewhere. | > | > | > | > | > | > | > | > Can't we just allocate a Cmm "area"? The address of an | > | > | > | > area is a | > | > | > | perfectly well-defined Cmm value. | > | > | > | | > | > | > | It is fragile, yes. We can't use static memory because it | > | > | > | needs to be thread-local. This particular hack has gone | > | > | > | through several iterations over the years: first we had | > | > | > | static memory, which broke when we did the parallel runtime, | > | > | > | then we had special storage in the Capability, which we gave | > | > | > | up when GMP was split out into a separate library, because | > | > | > | it didn't seem right to have magic fields in the Capability | for one library. | > | > | > | | > | > | > | I'm looking into whether we can do temporary allocation on | > | > | > | the heap for this instead. | > | > | > | | > | > | > | Cheers, | > | > | > | Simon | > | > | > | | > | > | > | | > | > | > | > Simon | > | > | > | > | > | > | > | > | -----Original Message----- | > | > | > | > | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On | > | > | > | > | Behalf Of Simon Marlow | > | > | > | > | Sent: 07 January 2014 16:05 | > | > | > | > | To: Herbert Valerio Riedel; ghc-devs@haskell.org | > | > | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | > | > | | > | > | > | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote: | > | > | > | > | > Hello, | > | > | > | > | > | > | > | > | > | > According to Note [Syntax of .cmm files], | > | > | > | > | > | > | > | > | > | > | There are two ways to write .cmm code: | > | > | > | > | > | | > | > | > | > | > | (1) High-level Cmm code delegates the stack | > | > | > | > | > | handling to GHC, | > | > | > | and | > | > | > | > | > | never explicitly mentions Sp or registers. | > | > | > | > | > | | > | > | > | > | > | (2) Low-level Cmm manages the stack itself, and | > | > | > | > | > | must know | > | > | about | > | > | > | > | > | calling conventions. | > | > | > | > | > | | > | > | > | > | > | Whether you want high-level or low-level Cmm is | > | > | > | > | > | indicated by the presence of an argument list on a | > | procedure. | > | > | > | > | > | > | > | > | > | > However, while working on integer-gmp I've been | > | > | > | > | > noticing in integer-gmp/cbits/gmp-wrappers.cmm that | > | > | > | > | > even though all Cmm | > | > | > | > | procedures | > | > | > | > | > have been converted to high-level Cmm, they still | > | > | > | > | > reference the | > | > | > | 'Sp' | > | > | > | > | > register, e.g. | > | > | > | > | > | > | > | > | > | > | > | > | > | > | > #define GMP_TAKE1_RET1(name,mp_fun) \ | > | > | > | > | > name (W_ ws1, P_ d1) \ | > | > | > | > | > { \ | > | > | > | > | > W_ mp_tmp1; \ | > | > | > | > | > W_ mp_result1; \ | > | > | > | > | > \ | > | > | > | > | > again: \ | > | > | > | > | > STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \ | > | > | > | > | > MAYBE_GC(again); \ | > | > | > | > | > \ | > | > | > | > | > mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \ | > | > | > | > | > mp_result1 = Sp - 2 * SIZEOF_MP_INT; \ | > | > | > | > | > ... \ | > | > | > | > | > | > | > | > | > | > | > | > | > | > | > So is this valid high-level Cmm code? What's the | > | > | > | > | > proper way to | > | > | > | > | allocate | > | > | > | > | > Stack (and/or Heap) memory from high-level Cmm code? | > | > | > | > | | > | > | > | > | Yes, this is technically wrong but luckily works. I'd | > | > | > | > | very much like to have a better solution, preferably one | > | > | > | > | that doesn't add any extra overhead. | > | > | > | > | | > | > | > | > | The problem here is that we need to allocate a couple of | > | > | > | > | temporary words and take their address; that's an | > | > | > | > | unusual thing to do in Cmm, so it only occurs in a few | > | > | > | > | places (mainly | > | > | interacting with gmp). | > | > | > | > | Usually if you want some temporary storage you can use | > | > | > | > | local variables or some heap-allocated memory. | > | > | > | > | | > | > | > | > | Cheers, | > | > | > | > | Simon | > | > | > | > | _______________________________________________ | > | > | > | > | ghc-devs mailing list | > | > | > | > | ghc-devs@haskell.org | > | > | > | > | http://www.haskell.org/mailman/listinfo/ghc-devs | > | > | > | > | > | > | > | > | > | >