
Hello, According to Note [Syntax of .cmm files], | There are two ways to write .cmm code: | | (1) High-level Cmm code delegates the stack handling to GHC, and | never explicitly mentions Sp or registers. | | (2) Low-level Cmm manages the stack itself, and must know about | calling conventions. | | Whether you want high-level or low-level Cmm is indicated by the | presence of an argument list on a procedure. However, while working on integer-gmp I've been noticing in integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm procedures have been converted to high-level Cmm, they still reference the 'Sp' register, e.g. #define GMP_TAKE1_RET1(name,mp_fun) \ name (W_ ws1, P_ d1) \ { \ W_ mp_tmp1; \ W_ mp_result1; \ \ again: \ STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \ MAYBE_GC(again); \ \ mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \ mp_result1 = Sp - 2 * SIZEOF_MP_INT; \ ... \ So is this valid high-level Cmm code? What's the proper way to allocate Stack (and/or Heap) memory from high-level Cmm code? Cheers, hvr

hey Herbert,
I generally start with looking at the primops.cmm file for examples
https://github.com/ghc/ghc/blob/master/rts/PrimOps.cmm#L572-L588
otoh, the comments in cmmparse.y indicate that's not quite "kosher"? or
maybe the comments are a lie?
https://github.com/ghc/ghc/blob/master/compiler/cmm/CmmParse.y#L24-L28
On Sat, Jan 4, 2014 at 6:26 PM, Herbert Valerio Riedel
Hello,
According to Note [Syntax of .cmm files],
| There are two ways to write .cmm code: | | (1) High-level Cmm code delegates the stack handling to GHC, and | never explicitly mentions Sp or registers. | | (2) Low-level Cmm manages the stack itself, and must know about | calling conventions. | | Whether you want high-level or low-level Cmm is indicated by the | presence of an argument list on a procedure.
However, while working on integer-gmp I've been noticing in integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm procedures have been converted to high-level Cmm, they still reference the 'Sp' register, e.g.
#define GMP_TAKE1_RET1(name,mp_fun) \ name (W_ ws1, P_ d1) \ { \ W_ mp_tmp1; \ W_ mp_result1; \ \ again: \ STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \ MAYBE_GC(again); \ \ mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \ mp_result1 = Sp - 2 * SIZEOF_MP_INT; \ ... \
So is this valid high-level Cmm code? What's the proper way to allocate Stack (and/or Heap) memory from high-level Cmm code?
Cheers, hvr _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On 2014-01-05 at 01:15:53 +0100, Carter Schonwald wrote:
hey Herbert, I generally start with looking at the primops.cmm file for examples https://github.com/ghc/ghc/blob/master/rts/PrimOps.cmm#L572-L588
stg_decodeFloatzuIntzh ( F_ arg ) { W_ p, mp_tmp1, W_ mp_tmp_w; STK_CHK_GEN_N (WDS(2)); mp_tmp1 = Sp - WDS(1); mp_tmp_w = Sp - WDS(2); ccall __decodeFloat_Int(mp_tmp1 "ptr", mp_tmp_w "ptr", arg); return (W_[mp_tmp1], W_[mp_tmp_w]); } that function in particular is compiled to [stg_decodeFloatzuIntzh() // [F1] { info_tbl: [] stack_info: arg_space: 8 updfr_space: Just 8 } {offset cc: _c0::F32 = F1; goto c4; c4: if ((old + 0) - 2 * 8 < SpLim) goto c6; else goto c7; c6: I64[(young<c5> + 8)] = c5; call stg_gc_noregs() returns to c5, args: 8, res: 8, upd: 8; c5: goto c4; c7: _c2::I64 = Sp - 1 * 8; _c3::I64 = Sp - 2 * 8; _c8::I64 = __decodeFloat_Int; _c9::I64 = _c2::I64; _ca::I64 = _c3::I64; _cb::F32 = _c0::F32; call "ccall" arg hints: [PtrHint, PtrHint,] result hints: [] (_c8::I64)(_c9::I64, _ca::I64, _cb::F32); R2 = I64[_c3::I64]; R1 = I64[_c2::I64]; call (P64[(old + 8)])(R2, R1) args: 8, res: 0, upd: 8; } }] But I see no effort to adjust Sp (i.e. `Sp = Sp - 16`) right before the call to __decodeFloat_Int; how is it ensured that __decodeFloat_Int doesn't use the locations Sp-8 and Sp-16 for as its local stack?
otoh, the comments in cmmparse.y indicate that's not quite "kosher"? or maybe the comments are a lie? https://github.com/ghc/ghc/blob/master/compiler/cmm/CmmParse.y#L24-L28

On 05/01/2014 11:46, Herbert Valerio Riedel wrote:
On 2014-01-05 at 01:15:53 +0100, Carter Schonwald wrote:
hey Herbert, I generally start with looking at the primops.cmm file for examples https://github.com/ghc/ghc/blob/master/rts/PrimOps.cmm#L572-L588
stg_decodeFloatzuIntzh ( F_ arg ) { W_ p, mp_tmp1, W_ mp_tmp_w;
STK_CHK_GEN_N (WDS(2));
mp_tmp1 = Sp - WDS(1); mp_tmp_w = Sp - WDS(2);
ccall __decodeFloat_Int(mp_tmp1 "ptr", mp_tmp_w "ptr", arg);
return (W_[mp_tmp1], W_[mp_tmp_w]); }
that function in particular is compiled to
[stg_decodeFloatzuIntzh() // [F1] { info_tbl: [] stack_info: arg_space: 8 updfr_space: Just 8 } {offset cc: _c0::F32 = F1; goto c4; c4: if ((old + 0) - 2 * 8 < SpLim) goto c6; else goto c7; c6: I64[(young<c5> + 8)] = c5; call stg_gc_noregs() returns to c5, args: 8, res: 8, upd: 8; c5: goto c4; c7: _c2::I64 = Sp - 1 * 8; _c3::I64 = Sp - 2 * 8; _c8::I64 = __decodeFloat_Int; _c9::I64 = _c2::I64; _ca::I64 = _c3::I64; _cb::F32 = _c0::F32; call "ccall" arg hints: [PtrHint, PtrHint,] result hints: [] (_c8::I64)(_c9::I64, _ca::I64, _cb::F32); R2 = I64[_c3::I64]; R1 = I64[_c2::I64]; call (P64[(old + 8)])(R2, R1) args: 8, res: 0, upd: 8; } }]
But I see no effort to adjust Sp (i.e. `Sp = Sp - 16`) right before the call to __decodeFloat_Int; how is it ensured that __decodeFloat_Int doesn't use the locations Sp-8 and Sp-16 for as its local stack?
__decodeFloat_Int is a C function, so it will not touch the Haskell stack. Cheers, Simon

On 2014-01-05 at 00:26:52 +0100, Herbert Valerio Riedel wrote: [...]
So is this valid high-level Cmm code? What's the proper way to allocate Stack (and/or Heap) memory from high-level Cmm code?
PS: ...are function calls supposed to work as advertised in https://github.com/ghc/ghc/blob/master/compiler/cmm/CmmParse.y#L76 ? I've tried using `(ret1,ret2) = call stg_fun (arg1,arg2);` in a Cmm file, but I get a parser error on `call` with GHC HEAD; only when leave out the return value assignement, i.e. when I use only `call stg_fun (arg1,arg2);`, it gets parsed succesfully. Is this a bug in the CmmParser? Cheers, hvr

i'm inclined to assume that its a parser error.
instead of (v)= call fun(args...argn);, did you try v = call fun(args1...n)
; ?
On Sat, Jan 4, 2014 at 7:27 PM, Herbert Valerio Riedel
On 2014-01-05 at 00:26:52 +0100, Herbert Valerio Riedel wrote:
[...]
So is this valid high-level Cmm code? What's the proper way to allocate Stack (and/or Heap) memory from high-level Cmm code?
PS: ...are function calls supposed to work as advertised in
https://github.com/ghc/ghc/blob/master/compiler/cmm/CmmParse.y#L76
?
I've tried using `(ret1,ret2) = call stg_fun (arg1,arg2);` in a Cmm file, but I get a parser error on `call` with GHC HEAD; only when leave out the return value assignement, i.e. when I use only `call stg_fun (arg1,arg2);`, it gets parsed succesfully. Is this a bug in the CmmParser?
Cheers, hvr _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On 2014-01-05 at 01:32:07 +0100, Carter Schonwald wrote:
i'm inclined to assume that its a parser error. instead of (v)= call fun(args...argn);, did you try v = call fun(args1...n) ; ?
I've looked more closely at the parser, and the relevant productions... | 'call' expr '(' exprs0 ')' ';' { doCall $2 [] $4 } | '(' formals ')' '=' 'call' expr '(' exprs0 ')' ';' { doCall $6 $2 $8 } ...actually require the return values to be newly declared registers, therefore the following works: foo() { W_ arg1, arg2; arg1 = 1; arg2 = 2; (W_ ret1, W_ ret2) = call stg_fun (arg1,arg2); return (ret2, ret1); } Cheers, hvr

On 04/01/2014 23:26, Herbert Valerio Riedel wrote:
Hello,
According to Note [Syntax of .cmm files],
| There are two ways to write .cmm code: | | (1) High-level Cmm code delegates the stack handling to GHC, and | never explicitly mentions Sp or registers. | | (2) Low-level Cmm manages the stack itself, and must know about | calling conventions. | | Whether you want high-level or low-level Cmm is indicated by the | presence of an argument list on a procedure.
However, while working on integer-gmp I've been noticing in integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm procedures have been converted to high-level Cmm, they still reference the 'Sp' register, e.g.
#define GMP_TAKE1_RET1(name,mp_fun) \ name (W_ ws1, P_ d1) \ { \ W_ mp_tmp1; \ W_ mp_result1; \ \ again: \ STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \ MAYBE_GC(again); \ \ mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \ mp_result1 = Sp - 2 * SIZEOF_MP_INT; \ ... \
So is this valid high-level Cmm code? What's the proper way to allocate Stack (and/or Heap) memory from high-level Cmm code?
Yes, this is technically wrong but luckily works. I'd very much like to have a better solution, preferably one that doesn't add any extra overhead. The problem here is that we need to allocate a couple of temporary words and take their address; that's an unusual thing to do in Cmm, so it only occurs in a few places (mainly interacting with gmp). Usually if you want some temporary storage you can use local variables or some heap-allocated memory. Cheers, Simon

Hello Simon, On 2014-01-07 at 17:04:52 +0100, Simon Marlow wrote: [...]
Yes, this is technically wrong but luckily works.
...but only as long as the code-generator doesn't try to push something on the stack, like e.g. when performing native 'call's which need to push the return-location on the stack...?
I'd very much like to have a better solution, preferably one that doesn't add any extra overhead.
I see... I've noticed there's a 'push() { ... }' construct that allows to push items on the stack; couldn't we have generalized version of that, taking a size-argument, declaring that specified amount of stack-space is user-allocated/controlled within the '{ ... }' scope? Greetings, hvr

On 07/01/2014 16:14, Herbert Valerio Riedel wrote:
Hello Simon,
On 2014-01-07 at 17:04:52 +0100, Simon Marlow wrote:
[...]
Yes, this is technically wrong but luckily works.
...but only as long as the code-generator doesn't try to push something on the stack, like e.g. when performing native 'call's which need to push the return-location on the stack...?
Right - in principle the code generator is in control of the stack so it can move the stack pointer whenever it likes, but in practice we know it only does this in certain places, like when making native calls, so these naughty functions just avoid doing that.
I'd very much like to have a better solution, preferably one that doesn't add any extra overhead.
I see... I've noticed there's a 'push() { ... }' construct that allows to push items on the stack; couldn't we have generalized version of that, taking a size-argument, declaring that specified amount of stack-space is user-allocated/controlled within the '{ ... }' scope?
We could push a stack frame, like we do for an update frame, but the problem is that we need a way to take the address of those stack locations. Taking the address of stack locations is also dodgy, because stacks move (say, during a native call). So it would still be unsafe. Also pushing a stack frame would incur an extra memory write for the info pointer, which is annoying. Cheers, Simon

| Yes, this is technically wrong but luckily works. I'd very much like | to | have a better solution, preferably one that doesn't add any extra | overhead. | __decodeFloat_Int is a C function, so it will not touch the Haskell | stack. This all seems terribly fragile to me. At least it ought to be surrounded with massive comments pointing out how terribly fragile it is, breaking all the rules that we carefully document elsewhere. Can't we just allocate a Cmm "area"? The address of an area is a perfectly well-defined Cmm value. Simon | -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Simon | Marlow | Sent: 07 January 2014 16:05 | To: Herbert Valerio Riedel; ghc-devs@haskell.org | Subject: Re: High-level Cmm code and stack allocation | | On 04/01/2014 23:26, Herbert Valerio Riedel wrote: | > Hello, | > | > According to Note [Syntax of .cmm files], | > | > | There are two ways to write .cmm code: | > | | > | (1) High-level Cmm code delegates the stack handling to GHC, and | > | never explicitly mentions Sp or registers. | > | | > | (2) Low-level Cmm manages the stack itself, and must know about | > | calling conventions. | > | | > | Whether you want high-level or low-level Cmm is indicated by the | > | presence of an argument list on a procedure. | > | > However, while working on integer-gmp I've been noticing in | > integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm | procedures | > have been converted to high-level Cmm, they still reference the 'Sp' | > register, e.g. | > | > | > #define GMP_TAKE1_RET1(name,mp_fun) \ | > name (W_ ws1, P_ d1) \ | > { \ | > W_ mp_tmp1; \ | > W_ mp_result1; \ | > \ | > again: \ | > STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \ | > MAYBE_GC(again); \ | > \ | > mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \ | > mp_result1 = Sp - 2 * SIZEOF_MP_INT; \ | > ... \ | > | > | > So is this valid high-level Cmm code? What's the proper way to | allocate | > Stack (and/or Heap) memory from high-level Cmm code? | | Yes, this is technically wrong but luckily works. I'd very much like | to | have a better solution, preferably one that doesn't add any extra | overhead. | | The problem here is that we need to allocate a couple of temporary | words | and take their address; that's an unusual thing to do in Cmm, so it | only | occurs in a few places (mainly interacting with gmp). Usually if you | want some temporary storage you can use local variables or some | heap-allocated memory. | | Cheers, | Simon | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | http://www.haskell.org/mailman/listinfo/ghc-devs

On 07/01/14 22:53, Simon Peyton Jones wrote:
| Yes, this is technically wrong but luckily works. I'd very much like | to | have a better solution, preferably one that doesn't add any extra | overhead.
| __decodeFloat_Int is a C function, so it will not touch the Haskell | stack.
This all seems terribly fragile to me. At least it ought to be surrounded with massive comments pointing out how terribly fragile it is, breaking all the rules that we carefully document elsewhere.
Can't we just allocate a Cmm "area"? The address of an area is a perfectly well-defined Cmm value.
It is fragile, yes. We can't use static memory because it needs to be thread-local. This particular hack has gone through several iterations over the years: first we had static memory, which broke when we did the parallel runtime, then we had special storage in the Capability, which we gave up when GMP was split out into a separate library, because it didn't seem right to have magic fields in the Capability for one library. I'm looking into whether we can do temporary allocation on the heap for this instead. Cheers, Simon
Simon
| -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Simon | Marlow | Sent: 07 January 2014 16:05 | To: Herbert Valerio Riedel; ghc-devs@haskell.org | Subject: Re: High-level Cmm code and stack allocation | | On 04/01/2014 23:26, Herbert Valerio Riedel wrote: | > Hello, | > | > According to Note [Syntax of .cmm files], | > | > | There are two ways to write .cmm code: | > | | > | (1) High-level Cmm code delegates the stack handling to GHC, and | > | never explicitly mentions Sp or registers. | > | | > | (2) Low-level Cmm manages the stack itself, and must know about | > | calling conventions. | > | | > | Whether you want high-level or low-level Cmm is indicated by the | > | presence of an argument list on a procedure. | > | > However, while working on integer-gmp I've been noticing in | > integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm | procedures | > have been converted to high-level Cmm, they still reference the 'Sp' | > register, e.g. | > | > | > #define GMP_TAKE1_RET1(name,mp_fun) \ | > name (W_ ws1, P_ d1) \ | > { \ | > W_ mp_tmp1; \ | > W_ mp_result1; \ | > \ | > again: \ | > STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \ | > MAYBE_GC(again); \ | > \ | > mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \ | > mp_result1 = Sp - 2 * SIZEOF_MP_INT; \ | > ... \ | > | > | > So is this valid high-level Cmm code? What's the proper way to | allocate | > Stack (and/or Heap) memory from high-level Cmm code? | | Yes, this is technically wrong but luckily works. I'd very much like | to | have a better solution, preferably one that doesn't add any extra | overhead. | | The problem here is that we need to allocate a couple of temporary | words | and take their address; that's an unusual thing to do in Cmm, so it | only | occurs in a few places (mainly interacting with gmp). Usually if you | want some temporary storage you can use local variables or some | heap-allocated memory. | | Cheers, | Simon | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | http://www.haskell.org/mailman/listinfo/ghc-devs

| > Can't we just allocate a Cmm "area"? The address of an area is a | perfectly well-defined Cmm value. What about this idea? Simon | -----Original Message----- | From: Simon Marlow [mailto:marlowsd@gmail.com] | Sent: 08 January 2014 09:26 | To: Simon Peyton Jones; Herbert Valerio Riedel | Cc: ghc-devs@haskell.org | Subject: Re: High-level Cmm code and stack allocation | | On 07/01/14 22:53, Simon Peyton Jones wrote: | > | Yes, this is technically wrong but luckily works. I'd very much | > | like to have a better solution, preferably one that doesn't add any | > | extra overhead. | > | > | __decodeFloat_Int is a C function, so it will not touch the Haskell | > | stack. | > | > This all seems terribly fragile to me. At least it ought to be | surrounded with massive comments pointing out how terribly fragile it | is, breaking all the rules that we carefully document elsewhere. | > | > Can't we just allocate a Cmm "area"? The address of an area is a | perfectly well-defined Cmm value. | | It is fragile, yes. We can't use static memory because it needs to be | thread-local. This particular hack has gone through several iterations | over the years: first we had static memory, which broke when we did the | parallel runtime, then we had special storage in the Capability, which | we gave up when GMP was split out into a separate library, because it | didn't seem right to have magic fields in the Capability for one | library. | | I'm looking into whether we can do temporary allocation on the heap for | this instead. | | Cheers, | Simon | | | > Simon | > | > | -----Original Message----- | > | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of | > | Simon Marlow | > | Sent: 07 January 2014 16:05 | > | To: Herbert Valerio Riedel; ghc-devs@haskell.org | > | Subject: Re: High-level Cmm code and stack allocation | > | | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote: | > | > Hello, | > | > | > | > According to Note [Syntax of .cmm files], | > | > | > | > | There are two ways to write .cmm code: | > | > | | > | > | (1) High-level Cmm code delegates the stack handling to GHC, | and | > | > | never explicitly mentions Sp or registers. | > | > | | > | > | (2) Low-level Cmm manages the stack itself, and must know about | > | > | calling conventions. | > | > | | > | > | Whether you want high-level or low-level Cmm is indicated by the | > | > | presence of an argument list on a procedure. | > | > | > | > However, while working on integer-gmp I've been noticing in | > | > integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm | > | procedures | > | > have been converted to high-level Cmm, they still reference the | 'Sp' | > | > register, e.g. | > | > | > | > | > | > #define GMP_TAKE1_RET1(name,mp_fun) \ | > | > name (W_ ws1, P_ d1) \ | > | > { \ | > | > W_ mp_tmp1; \ | > | > W_ mp_result1; \ | > | > \ | > | > again: \ | > | > STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \ | > | > MAYBE_GC(again); \ | > | > \ | > | > mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \ | > | > mp_result1 = Sp - 2 * SIZEOF_MP_INT; \ | > | > ... \ | > | > | > | > | > | > So is this valid high-level Cmm code? What's the proper way to | > | allocate | > | > Stack (and/or Heap) memory from high-level Cmm code? | > | | > | Yes, this is technically wrong but luckily works. I'd very much | > | like to have a better solution, preferably one that doesn't add any | > | extra overhead. | > | | > | The problem here is that we need to allocate a couple of temporary | > | words and take their address; that's an unusual thing to do in Cmm, | > | so it only occurs in a few places (mainly interacting with gmp). | > | Usually if you want some temporary storage you can use local | > | variables or some heap-allocated memory. | > | | > | Cheers, | > | Simon | > | _______________________________________________ | > | ghc-devs mailing list | > | ghc-devs@haskell.org | > | http://www.haskell.org/mailman/listinfo/ghc-devs | >

On 08/01/2014 10:07, Simon Peyton Jones wrote:
| > Can't we just allocate a Cmm "area"? The address of an area is a | perfectly well-defined Cmm value.
What about this idea?
We don't really have a general concept of areas (any more), and areas aren't exposed in the concrete Cmm syntax at all. The current semantics is that areas may overlap with each other, so there should only be one active area at any point. I found that this was important to ensure that we could generate good code from the stack layout algorithm, otherwise it had to make pessimistic assumptions and use too much stack. You're going to ask me where this is documented, and I think I have to admit to slacking off, sorry :-) We did discuss it at the time, and I made copious notes, but I didn't transfer those to the code. I'll add a Note. Cheers, Simon
Simon
| -----Original Message----- | From: Simon Marlow [mailto:marlowsd@gmail.com] | Sent: 08 January 2014 09:26 | To: Simon Peyton Jones; Herbert Valerio Riedel | Cc: ghc-devs@haskell.org | Subject: Re: High-level Cmm code and stack allocation | | On 07/01/14 22:53, Simon Peyton Jones wrote: | > | Yes, this is technically wrong but luckily works. I'd very much | > | like to have a better solution, preferably one that doesn't add any | > | extra overhead. | > | > | __decodeFloat_Int is a C function, so it will not touch the Haskell | > | stack. | > | > This all seems terribly fragile to me. At least it ought to be | surrounded with massive comments pointing out how terribly fragile it | is, breaking all the rules that we carefully document elsewhere. | > | > Can't we just allocate a Cmm "area"? The address of an area is a | perfectly well-defined Cmm value. | | It is fragile, yes. We can't use static memory because it needs to be | thread-local. This particular hack has gone through several iterations | over the years: first we had static memory, which broke when we did the | parallel runtime, then we had special storage in the Capability, which | we gave up when GMP was split out into a separate library, because it | didn't seem right to have magic fields in the Capability for one | library. | | I'm looking into whether we can do temporary allocation on the heap for | this instead. | | Cheers, | Simon | | | > Simon | > | > | -----Original Message----- | > | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of | > | Simon Marlow | > | Sent: 07 January 2014 16:05 | > | To: Herbert Valerio Riedel; ghc-devs@haskell.org | > | Subject: Re: High-level Cmm code and stack allocation | > | | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote: | > | > Hello, | > | > | > | > According to Note [Syntax of .cmm files], | > | > | > | > | There are two ways to write .cmm code: | > | > | | > | > | (1) High-level Cmm code delegates the stack handling to GHC, | and | > | > | never explicitly mentions Sp or registers. | > | > | | > | > | (2) Low-level Cmm manages the stack itself, and must know about | > | > | calling conventions. | > | > | | > | > | Whether you want high-level or low-level Cmm is indicated by the | > | > | presence of an argument list on a procedure. | > | > | > | > However, while working on integer-gmp I've been noticing in | > | > integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm | > | procedures | > | > have been converted to high-level Cmm, they still reference the | 'Sp' | > | > register, e.g. | > | > | > | > | > | > #define GMP_TAKE1_RET1(name,mp_fun) \ | > | > name (W_ ws1, P_ d1) \ | > | > { \ | > | > W_ mp_tmp1; \ | > | > W_ mp_result1; \ | > | > \ | > | > again: \ | > | > STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \ | > | > MAYBE_GC(again); \ | > | > \ | > | > mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \ | > | > mp_result1 = Sp - 2 * SIZEOF_MP_INT; \ | > | > ... \ | > | > | > | > | > | > So is this valid high-level Cmm code? What's the proper way to | > | allocate | > | > Stack (and/or Heap) memory from high-level Cmm code? | > | | > | Yes, this is technically wrong but luckily works. I'd very much | > | like to have a better solution, preferably one that doesn't add any | > | extra overhead. | > | | > | The problem here is that we need to allocate a couple of temporary | > | words and take their address; that's an unusual thing to do in Cmm, | > | so it only occurs in a few places (mainly interacting with gmp). | > | Usually if you want some temporary storage you can use local | > | variables or some heap-allocated memory. | > | | > | Cheers, | > | Simon | > | _______________________________________________ | > | ghc-devs mailing list | > | ghc-devs@haskell.org | > | http://www.haskell.org/mailman/listinfo/ghc-devs | >

That documentation would be good, yes! I don't know what it means to say "we don't really have a general concept of areas any more". We did before, and I didn't know that it had gone away. Urk! We can have lots of live areas, notably the old area (for the current call/return parameters, the call area for a call we are preparing, and the one-slot areas for variables we are saving on the stack. Here's he current story https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAreas I agree that we have no concrete syntax for talking about areas, but that is something we could fix. But I'm worried that they may not mean what they used to mean. Simon | -----Original Message----- | From: Simon Marlow [mailto:marlowsd@gmail.com] | Sent: 09 January 2014 08:39 | To: Simon Peyton Jones; Herbert Valerio Riedel | Cc: ghc-devs@haskell.org | Subject: Re: High-level Cmm code and stack allocation | | On 08/01/2014 10:07, Simon Peyton Jones wrote: | > | > Can't we just allocate a Cmm "area"? The address of an area is a | > | perfectly well-defined Cmm value. | > | > What about this idea? | | We don't really have a general concept of areas (any more), and areas | aren't exposed in the concrete Cmm syntax at all. The current semantics | is that areas may overlap with each other, so there should only be one | active area at any point. I found that this was important to ensure | that we could generate good code from the stack layout algorithm, | otherwise it had to make pessimistic assumptions and use too much stack. | | You're going to ask me where this is documented, and I think I have to | admit to slacking off, sorry :-) We did discuss it at the time, and I | made copious notes, but I didn't transfer those to the code. I'll add a | Note. | | Cheers, | Simon | | | > Simon | > | > | -----Original Message----- | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | Sent: 08 January 2014 09:26 | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | Cc: ghc-devs@haskell.org | > | Subject: Re: High-level Cmm code and stack allocation | > | | > | On 07/01/14 22:53, Simon Peyton Jones wrote: | > | > | Yes, this is technically wrong but luckily works. I'd very much | > | > | like to have a better solution, preferably one that doesn't add | > | > | any extra overhead. | > | > | > | > | __decodeFloat_Int is a C function, so it will not touch the | > | > | Haskell stack. | > | > | > | > This all seems terribly fragile to me. At least it ought to be | > | surrounded with massive comments pointing out how terribly fragile | > | it is, breaking all the rules that we carefully document elsewhere. | > | > | > | > Can't we just allocate a Cmm "area"? The address of an area is a | > | perfectly well-defined Cmm value. | > | | > | It is fragile, yes. We can't use static memory because it needs to | > | be thread-local. This particular hack has gone through several | > | iterations over the years: first we had static memory, which broke | > | when we did the parallel runtime, then we had special storage in the | > | Capability, which we gave up when GMP was split out into a separate | > | library, because it didn't seem right to have magic fields in the | > | Capability for one library. | > | | > | I'm looking into whether we can do temporary allocation on the heap | > | for this instead. | > | | > | Cheers, | > | Simon | > | | > | | > | > Simon | > | > | > | > | -----Original Message----- | > | > | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf | > | > | Of Simon Marlow | > | > | Sent: 07 January 2014 16:05 | > | > | To: Herbert Valerio Riedel; ghc-devs@haskell.org | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | | > | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote: | > | > | > Hello, | > | > | > | > | > | > According to Note [Syntax of .cmm files], | > | > | > | > | > | > | There are two ways to write .cmm code: | > | > | > | | > | > | > | (1) High-level Cmm code delegates the stack handling to | > | > | > | GHC, | > | and | > | > | > | never explicitly mentions Sp or registers. | > | > | > | | > | > | > | (2) Low-level Cmm manages the stack itself, and must know | about | > | > | > | calling conventions. | > | > | > | | > | > | > | Whether you want high-level or low-level Cmm is indicated by | > | > | > | the presence of an argument list on a procedure. | > | > | > | > | > | > However, while working on integer-gmp I've been noticing in | > | > | > integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm | > | > | procedures | > | > | > have been converted to high-level Cmm, they still reference | > | > | > the | > | 'Sp' | > | > | > register, e.g. | > | > | > | > | > | > | > | > | > #define GMP_TAKE1_RET1(name,mp_fun) \ | > | > | > name (W_ ws1, P_ d1) \ | > | > | > { \ | > | > | > W_ mp_tmp1; \ | > | > | > W_ mp_result1; \ | > | > | > \ | > | > | > again: \ | > | > | > STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \ | > | > | > MAYBE_GC(again); \ | > | > | > \ | > | > | > mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \ | > | > | > mp_result1 = Sp - 2 * SIZEOF_MP_INT; \ | > | > | > ... \ | > | > | > | > | > | > | > | > | > So is this valid high-level Cmm code? What's the proper way to | > | > | allocate | > | > | > Stack (and/or Heap) memory from high-level Cmm code? | > | > | | > | > | Yes, this is technically wrong but luckily works. I'd very much | > | > | like to have a better solution, preferably one that doesn't add | > | > | any extra overhead. | > | > | | > | > | The problem here is that we need to allocate a couple of | > | > | temporary words and take their address; that's an unusual thing | > | > | to do in Cmm, so it only occurs in a few places (mainly | interacting with gmp). | > | > | Usually if you want some temporary storage you can use local | > | > | variables or some heap-allocated memory. | > | > | | > | > | Cheers, | > | > | Simon | > | > | _______________________________________________ | > | > | ghc-devs mailing list | > | > | ghc-devs@haskell.org | > | > | http://www.haskell.org/mailman/listinfo/ghc-devs | > | > | >

There are no one-slot areas any more, I ditched those when I rewrote the stack allocator. There is only ever one live area: either the old area or the young area for a call we are about to make or have just made. (see the data type: I removed the one-slot areas) I struggled for a long time with this. The problem is that with the semantics of non-overlapping areas, code motion optimisations would tend to increase the stack requirements of the function by overlapping the live ranges of the areas. I concluded that actually what we wanted was areas that really do overlap, and optimisations that respect that, so that we get more efficient stack usage. Cheers, Simon On 10/01/2014 15:22, Simon Peyton Jones wrote:
That documentation would be good, yes! I don't know what it means to say "we don't really have a general concept of areas any more". We did before, and I didn't know that it had gone away. Urk! We can have lots of live areas, notably the old area (for the current call/return parameters, the call area for a call we are preparing, and the one-slot areas for variables we are saving on the stack.
Here's he current story https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAreas
I agree that we have no concrete syntax for talking about areas, but that is something we could fix. But I'm worried that they may not mean what they used to mean.
Simon
| -----Original Message----- | From: Simon Marlow [mailto:marlowsd@gmail.com] | Sent: 09 January 2014 08:39 | To: Simon Peyton Jones; Herbert Valerio Riedel | Cc: ghc-devs@haskell.org | Subject: Re: High-level Cmm code and stack allocation | | On 08/01/2014 10:07, Simon Peyton Jones wrote: | > | > Can't we just allocate a Cmm "area"? The address of an area is a | > | perfectly well-defined Cmm value. | > | > What about this idea? | | We don't really have a general concept of areas (any more), and areas | aren't exposed in the concrete Cmm syntax at all. The current semantics | is that areas may overlap with each other, so there should only be one | active area at any point. I found that this was important to ensure | that we could generate good code from the stack layout algorithm, | otherwise it had to make pessimistic assumptions and use too much stack. | | You're going to ask me where this is documented, and I think I have to | admit to slacking off, sorry :-) We did discuss it at the time, and I | made copious notes, but I didn't transfer those to the code. I'll add a | Note. | | Cheers, | Simon | | | > Simon | > | > | -----Original Message----- | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | Sent: 08 January 2014 09:26 | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | Cc: ghc-devs@haskell.org | > | Subject: Re: High-level Cmm code and stack allocation | > | | > | On 07/01/14 22:53, Simon Peyton Jones wrote: | > | > | Yes, this is technically wrong but luckily works. I'd very much | > | > | like to have a better solution, preferably one that doesn't add | > | > | any extra overhead. | > | > | > | > | __decodeFloat_Int is a C function, so it will not touch the | > | > | Haskell stack. | > | > | > | > This all seems terribly fragile to me. At least it ought to be | > | surrounded with massive comments pointing out how terribly fragile | > | it is, breaking all the rules that we carefully document elsewhere. | > | > | > | > Can't we just allocate a Cmm "area"? The address of an area is a | > | perfectly well-defined Cmm value. | > | | > | It is fragile, yes. We can't use static memory because it needs to | > | be thread-local. This particular hack has gone through several | > | iterations over the years: first we had static memory, which broke | > | when we did the parallel runtime, then we had special storage in the | > | Capability, which we gave up when GMP was split out into a separate | > | library, because it didn't seem right to have magic fields in the | > | Capability for one library. | > | | > | I'm looking into whether we can do temporary allocation on the heap | > | for this instead. | > | | > | Cheers, | > | Simon | > | | > | | > | > Simon | > | > | > | > | -----Original Message----- | > | > | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf | > | > | Of Simon Marlow | > | > | Sent: 07 January 2014 16:05 | > | > | To: Herbert Valerio Riedel; ghc-devs@haskell.org | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | | > | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote: | > | > | > Hello, | > | > | > | > | > | > According to Note [Syntax of .cmm files], | > | > | > | > | > | > | There are two ways to write .cmm code: | > | > | > | | > | > | > | (1) High-level Cmm code delegates the stack handling to | > | > | > | GHC, | > | and | > | > | > | never explicitly mentions Sp or registers. | > | > | > | | > | > | > | (2) Low-level Cmm manages the stack itself, and must know | about | > | > | > | calling conventions. | > | > | > | | > | > | > | Whether you want high-level or low-level Cmm is indicated by | > | > | > | the presence of an argument list on a procedure. | > | > | > | > | > | > However, while working on integer-gmp I've been noticing in | > | > | > integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm | > | > | procedures | > | > | > have been converted to high-level Cmm, they still reference | > | > | > the | > | 'Sp' | > | > | > register, e.g. | > | > | > | > | > | > | > | > | > #define GMP_TAKE1_RET1(name,mp_fun) \ | > | > | > name (W_ ws1, P_ d1) \ | > | > | > { \ | > | > | > W_ mp_tmp1; \ | > | > | > W_ mp_result1; \ | > | > | > \ | > | > | > again: \ | > | > | > STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \ | > | > | > MAYBE_GC(again); \ | > | > | > \ | > | > | > mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \ | > | > | > mp_result1 = Sp - 2 * SIZEOF_MP_INT; \ | > | > | > ... \ | > | > | > | > | > | > | > | > | > So is this valid high-level Cmm code? What's the proper way to | > | > | allocate | > | > | > Stack (and/or Heap) memory from high-level Cmm code? | > | > | | > | > | Yes, this is technically wrong but luckily works. I'd very much | > | > | like to have a better solution, preferably one that doesn't add | > | > | any extra overhead. | > | > | | > | > | The problem here is that we need to allocate a couple of | > | > | temporary words and take their address; that's an unusual thing | > | > | to do in Cmm, so it only occurs in a few places (mainly | interacting with gmp). | > | > | Usually if you want some temporary storage you can use local | > | > | variables or some heap-allocated memory. | > | > | | > | > | Cheers, | > | > | Simon | > | > | _______________________________________________ | > | > | ghc-devs mailing list | > | > | ghc-devs@haskell.org | > | > | http://www.haskell.org/mailman/listinfo/ghc-devs | > | > | >

Oh, ok. Alas, a good chunk of my model of Cmm has just gone out of the window. I thought that areas were such a lovely, well-behaved abstraction. I was thrilled when we came up with them, and I'm very sorry to see them go. There are no many things that I no longer understand. I now have no idea how we save live variables over a call, or how multiple returned values from one call (returned on the stack) stay right where they are if they are live across the next call. What was the actual problem? That functions used too much stack, so the stack was getting too big? But a one slot area corresponds exactly to a live variable, so I don't see how the area abstraction could possibly increase stack size. And is stack size a crucial issue anyway? Apart from anything else, areas would have given a lovely solution to the problem this thread started with! I guess we can talk about this when you next visit? But some documentation would be welcome. Simon | -----Original Message----- | From: Simon Marlow [mailto:marlowsd@gmail.com] | Sent: 10 January 2014 16:24 | To: Simon Peyton Jones; Herbert Valerio Riedel | Cc: ghc-devs@haskell.org | Subject: Re: High-level Cmm code and stack allocation | | There are no one-slot areas any more, I ditched those when I rewrote the | stack allocator. There is only ever one live area: either the old area | or the young area for a call we are about to make or have just made. | (see the data type: I removed the one-slot areas) | | I struggled for a long time with this. The problem is that with the | semantics of non-overlapping areas, code motion optimisations would tend | to increase the stack requirements of the function by overlapping the | live ranges of the areas. I concluded that actually what we wanted was | areas that really do overlap, and optimisations that respect that, so | that we get more efficient stack usage. | | Cheers, | Simon | | On 10/01/2014 15:22, Simon Peyton Jones wrote: | > That documentation would be good, yes! I don't know what it means to | say "we don't really have a general concept of areas any more". We did | before, and I didn't know that it had gone away. Urk! We can have lots | of live areas, notably the old area (for the current call/return | parameters, the call area for a call we are preparing, and the one-slot | areas for variables we are saving on the stack. | > | > Here's he current story | > https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAreas | > | > I agree that we have no concrete syntax for talking about areas, but | that is something we could fix. But I'm worried that they may not mean | what they used to mean. | > | > Simon | > | > | -----Original Message----- | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | Sent: 09 January 2014 08:39 | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | Cc: ghc-devs@haskell.org | > | Subject: Re: High-level Cmm code and stack allocation | > | | > | On 08/01/2014 10:07, Simon Peyton Jones wrote: | > | > | > Can't we just allocate a Cmm "area"? The address of an area is | > | > | > a | > | > | perfectly well-defined Cmm value. | > | > | > | > What about this idea? | > | | > | We don't really have a general concept of areas (any more), and | > | areas aren't exposed in the concrete Cmm syntax at all. The current | > | semantics is that areas may overlap with each other, so there should | > | only be one active area at any point. I found that this was | > | important to ensure that we could generate good code from the stack | > | layout algorithm, otherwise it had to make pessimistic assumptions | and use too much stack. | > | | > | You're going to ask me where this is documented, and I think I have | > | to admit to slacking off, sorry :-) We did discuss it at the time, | > | and I made copious notes, but I didn't transfer those to the code. | > | I'll add a Note. | > | | > | Cheers, | > | Simon | > | | > | | > | > Simon | > | > | > | > | -----Original Message----- | > | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | > | Sent: 08 January 2014 09:26 | > | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | > | Cc: ghc-devs@haskell.org | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | | > | > | On 07/01/14 22:53, Simon Peyton Jones wrote: | > | > | > | Yes, this is technically wrong but luckily works. I'd very | > | > | > | much like to have a better solution, preferably one that | > | > | > | doesn't add any extra overhead. | > | > | > | > | > | > | __decodeFloat_Int is a C function, so it will not touch the | > | > | > | Haskell stack. | > | > | > | > | > | > This all seems terribly fragile to me. At least it ought to | > | > | > be | > | > | surrounded with massive comments pointing out how terribly | > | > | fragile it is, breaking all the rules that we carefully document | elsewhere. | > | > | > | > | > | > Can't we just allocate a Cmm "area"? The address of an area is | > | > | > a | > | > | perfectly well-defined Cmm value. | > | > | | > | > | It is fragile, yes. We can't use static memory because it needs | > | > | to be thread-local. This particular hack has gone through | > | > | several iterations over the years: first we had static memory, | > | > | which broke when we did the parallel runtime, then we had | > | > | special storage in the Capability, which we gave up when GMP was | > | > | split out into a separate library, because it didn't seem right | > | > | to have magic fields in the Capability for one library. | > | > | | > | > | I'm looking into whether we can do temporary allocation on the | > | > | heap for this instead. | > | > | | > | > | Cheers, | > | > | Simon | > | > | | > | > | | > | > | > Simon | > | > | > | > | > | > | -----Original Message----- | > | > | > | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On | > | > | > | Behalf Of Simon Marlow | > | > | > | Sent: 07 January 2014 16:05 | > | > | > | To: Herbert Valerio Riedel; ghc-devs@haskell.org | > | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | > | | > | > | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote: | > | > | > | > Hello, | > | > | > | > | > | > | > | > According to Note [Syntax of .cmm files], | > | > | > | > | > | > | > | > | There are two ways to write .cmm code: | > | > | > | > | | > | > | > | > | (1) High-level Cmm code delegates the stack handling to | > | > | > | > | GHC, | > | > | and | > | > | > | > | never explicitly mentions Sp or registers. | > | > | > | > | | > | > | > | > | (2) Low-level Cmm manages the stack itself, and must | > | > | > | > | know | > | about | > | > | > | > | calling conventions. | > | > | > | > | | > | > | > | > | Whether you want high-level or low-level Cmm is | > | > | > | > | indicated by the presence of an argument list on a | procedure. | > | > | > | > | > | > | > | > However, while working on integer-gmp I've been noticing | > | > | > | > in integer-gmp/cbits/gmp-wrappers.cmm that even though all | > | > | > | > Cmm | > | > | > | procedures | > | > | > | > have been converted to high-level Cmm, they still | > | > | > | > reference the | > | > | 'Sp' | > | > | > | > register, e.g. | > | > | > | > | > | > | > | > | > | > | > | > #define GMP_TAKE1_RET1(name,mp_fun) \ | > | > | > | > name (W_ ws1, P_ d1) \ | > | > | > | > { \ | > | > | > | > W_ mp_tmp1; \ | > | > | > | > W_ mp_result1; \ | > | > | > | > \ | > | > | > | > again: \ | > | > | > | > STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \ | > | > | > | > MAYBE_GC(again); \ | > | > | > | > \ | > | > | > | > mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \ | > | > | > | > mp_result1 = Sp - 2 * SIZEOF_MP_INT; \ | > | > | > | > ... \ | > | > | > | > | > | > | > | > | > | > | > | > So is this valid high-level Cmm code? What's the proper | > | > | > | > way to | > | > | > | allocate | > | > | > | > Stack (and/or Heap) memory from high-level Cmm code? | > | > | > | | > | > | > | Yes, this is technically wrong but luckily works. I'd very | > | > | > | much like to have a better solution, preferably one that | > | > | > | doesn't add any extra overhead. | > | > | > | | > | > | > | The problem here is that we need to allocate a couple of | > | > | > | temporary words and take their address; that's an unusual | > | > | > | thing to do in Cmm, so it only occurs in a few places | > | > | > | (mainly | > | interacting with gmp). | > | > | > | Usually if you want some temporary storage you can use local | > | > | > | variables or some heap-allocated memory. | > | > | > | | > | > | > | Cheers, | > | > | > | Simon | > | > | > | _______________________________________________ | > | > | > | ghc-devs mailing list | > | > | > | ghc-devs@haskell.org | > | > | > | http://www.haskell.org/mailman/listinfo/ghc-devs | > | > | > | > | > | >

So stack areas are still a great abstraction, the only change is that they now overlap. It's not just about stack getting too big, I've copied the notes I made about it below (which I will paste into the code in due course). The nice property that we can generate well-defined Cmm without knowing explicit stack offsets is intact. What is different is that there used to be an intermediate state where live variables were saved to abstract stack areas across calls, but Sp was still not manifest. This intermediate state doesn't exist any more, the stack layout algorithm does it all in one pass. To me this was far simpler, and I think it ended up being fewer lines of code than the old multi-phase stack layout algorithm (it's also much faster). Of course you can always change this. My goal was to get code that was at least as good as the old code generator and in a reasonable amount of time, and this was the shortest path I could find to that goal. Cheers, Simon e.g. if we had x = Sp[old + 8] y = Sp[old + 16] Sp[young(L) + 8] = L Sp[young(L) + 16] = y Sp[young(L) + 24] = x call f() returns to L if areas semantically do not overlap, then we might optimise this to Sp[young(L) + 8] = L Sp[young(L) + 16] = Sp[old + 8] Sp[young(L) + 24] = Sp[old + 16] call f() returns to L and now young(L) cannot be allocated at the same place as old, and we are doomed to use more stack. - old+8 conflicts with young(L)+8 - old+16 conflicts with young(L)+16 and young(L)+8 so young(L)+8 == old+24 and we get Sp[-8] = L Sp[-16] = Sp[8] Sp[-24] = Sp[0] Sp -= 24 call f() returns to L However, if areas are defined to be "possibly overlapping" in the semantics, then we cannot commute any loads/stores of old with young(L), and we will be able to re-use both old+8 and old+16 for young(L). x = Sp[8] y = Sp[0] Sp[8] = L Sp[0] = y Sp[-8] = x Sp = Sp - 8 call f() returns to L Now, the assignments of y go away, x = Sp[8] Sp[8] = L Sp[-8] = x Sp = Sp - 8 call f() returns to L Conclusion: - T[old+N] aliases with U[young(L)+M] for all T, U, L, N and M - T[old+N] aliases with U[old+M] only if the areas actually overlap this ensures that we will not commute any accesses to old with young(L) or young(L) with young(L'), and the stack allocator will get the maximum opportunity to overlap these areas, keeping the stack use to a minimum and possibly avoiding some assignments. On 10/01/2014 16:35, Simon Peyton Jones wrote:
Oh, ok. Alas, a good chunk of my model of Cmm has just gone out of the window. I thought that areas were such a lovely, well-behaved abstraction. I was thrilled when we came up with them, and I'm very sorry to see them go.
There are no many things that I no longer understand. I now have no idea how we save live variables over a call, or how multiple returned values from one call (returned on the stack) stay right where they are if they are live across the next call.
What was the actual problem? That functions used too much stack, so the stack was getting too big? But a one slot area corresponds exactly to a live variable, so I don't see how the area abstraction could possibly increase stack size. And is stack size a crucial issue anyway?
Apart from anything else, areas would have given a lovely solution to the problem this thread started with!
I guess we can talk about this when you next visit? But some documentation would be welcome.
Simon
| -----Original Message----- | From: Simon Marlow [mailto:marlowsd@gmail.com] | Sent: 10 January 2014 16:24 | To: Simon Peyton Jones; Herbert Valerio Riedel | Cc: ghc-devs@haskell.org | Subject: Re: High-level Cmm code and stack allocation | | There are no one-slot areas any more, I ditched those when I rewrote the | stack allocator. There is only ever one live area: either the old area | or the young area for a call we are about to make or have just made. | (see the data type: I removed the one-slot areas) | | I struggled for a long time with this. The problem is that with the | semantics of non-overlapping areas, code motion optimisations would tend | to increase the stack requirements of the function by overlapping the | live ranges of the areas. I concluded that actually what we wanted was | areas that really do overlap, and optimisations that respect that, so | that we get more efficient stack usage. | | Cheers, | Simon | | On 10/01/2014 15:22, Simon Peyton Jones wrote: | > That documentation would be good, yes! I don't know what it means to | say "we don't really have a general concept of areas any more". We did | before, and I didn't know that it had gone away. Urk! We can have lots | of live areas, notably the old area (for the current call/return | parameters, the call area for a call we are preparing, and the one-slot | areas for variables we are saving on the stack. | > | > Here's he current story | > https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAreas | > | > I agree that we have no concrete syntax for talking about areas, but | that is something we could fix. But I'm worried that they may not mean | what they used to mean. | > | > Simon | > | > | -----Original Message----- | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | Sent: 09 January 2014 08:39 | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | Cc: ghc-devs@haskell.org | > | Subject: Re: High-level Cmm code and stack allocation | > | | > | On 08/01/2014 10:07, Simon Peyton Jones wrote: | > | > | > Can't we just allocate a Cmm "area"? The address of an area is | > | > | > a | > | > | perfectly well-defined Cmm value. | > | > | > | > What about this idea? | > | | > | We don't really have a general concept of areas (any more), and | > | areas aren't exposed in the concrete Cmm syntax at all. The current | > | semantics is that areas may overlap with each other, so there should | > | only be one active area at any point. I found that this was | > | important to ensure that we could generate good code from the stack | > | layout algorithm, otherwise it had to make pessimistic assumptions | and use too much stack. | > | | > | You're going to ask me where this is documented, and I think I have | > | to admit to slacking off, sorry :-) We did discuss it at the time, | > | and I made copious notes, but I didn't transfer those to the code. | > | I'll add a Note. | > | | > | Cheers, | > | Simon | > | | > | | > | > Simon | > | > | > | > | -----Original Message----- | > | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | > | Sent: 08 January 2014 09:26 | > | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | > | Cc: ghc-devs@haskell.org | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | | > | > | On 07/01/14 22:53, Simon Peyton Jones wrote: | > | > | > | Yes, this is technically wrong but luckily works. I'd very | > | > | > | much like to have a better solution, preferably one that | > | > | > | doesn't add any extra overhead. | > | > | > | > | > | > | __decodeFloat_Int is a C function, so it will not touch the | > | > | > | Haskell stack. | > | > | > | > | > | > This all seems terribly fragile to me. At least it ought to | > | > | > be | > | > | surrounded with massive comments pointing out how terribly | > | > | fragile it is, breaking all the rules that we carefully document | elsewhere. | > | > | > | > | > | > Can't we just allocate a Cmm "area"? The address of an area is | > | > | > a | > | > | perfectly well-defined Cmm value. | > | > | | > | > | It is fragile, yes. We can't use static memory because it needs | > | > | to be thread-local. This particular hack has gone through | > | > | several iterations over the years: first we had static memory, | > | > | which broke when we did the parallel runtime, then we had | > | > | special storage in the Capability, which we gave up when GMP was | > | > | split out into a separate library, because it didn't seem right | > | > | to have magic fields in the Capability for one library. | > | > | | > | > | I'm looking into whether we can do temporary allocation on the | > | > | heap for this instead. | > | > | | > | > | Cheers, | > | > | Simon | > | > | | > | > | | > | > | > Simon | > | > | > | > | > | > | -----Original Message----- | > | > | > | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On | > | > | > | Behalf Of Simon Marlow | > | > | > | Sent: 07 January 2014 16:05 | > | > | > | To: Herbert Valerio Riedel; ghc-devs@haskell.org | > | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | > | | > | > | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote: | > | > | > | > Hello, | > | > | > | > | > | > | > | > According to Note [Syntax of .cmm files], | > | > | > | > | > | > | > | > | There are two ways to write .cmm code: | > | > | > | > | | > | > | > | > | (1) High-level Cmm code delegates the stack handling to | > | > | > | > | GHC, | > | > | and | > | > | > | > | never explicitly mentions Sp or registers. | > | > | > | > | | > | > | > | > | (2) Low-level Cmm manages the stack itself, and must | > | > | > | > | know | > | about | > | > | > | > | calling conventions. | > | > | > | > | | > | > | > | > | Whether you want high-level or low-level Cmm is | > | > | > | > | indicated by the presence of an argument list on a | procedure. | > | > | > | > | > | > | > | > However, while working on integer-gmp I've been noticing | > | > | > | > in integer-gmp/cbits/gmp-wrappers.cmm that even though all | > | > | > | > Cmm | > | > | > | procedures | > | > | > | > have been converted to high-level Cmm, they still | > | > | > | > reference the | > | > | 'Sp' | > | > | > | > register, e.g. | > | > | > | > | > | > | > | > | > | > | > | > #define GMP_TAKE1_RET1(name,mp_fun) \ | > | > | > | > name (W_ ws1, P_ d1) \ | > | > | > | > { \ | > | > | > | > W_ mp_tmp1; \ | > | > | > | > W_ mp_result1; \ | > | > | > | > \ | > | > | > | > again: \ | > | > | > | > STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \ | > | > | > | > MAYBE_GC(again); \ | > | > | > | > \ | > | > | > | > mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \ | > | > | > | > mp_result1 = Sp - 2 * SIZEOF_MP_INT; \ | > | > | > | > ... \ | > | > | > | > | > | > | > | > | > | > | > | > So is this valid high-level Cmm code? What's the proper | > | > | > | > way to | > | > | > | allocate | > | > | > | > Stack (and/or Heap) memory from high-level Cmm code? | > | > | > | | > | > | > | Yes, this is technically wrong but luckily works. I'd very | > | > | > | much like to have a better solution, preferably one that | > | > | > | doesn't add any extra overhead. | > | > | > | | > | > | > | The problem here is that we need to allocate a couple of | > | > | > | temporary words and take their address; that's an unusual | > | > | > | thing to do in Cmm, so it only occurs in a few places | > | > | > | (mainly | > | interacting with gmp). | > | > | > | Usually if you want some temporary storage you can use local | > | > | > | variables or some heap-allocated memory. | > | > | > | | > | > | > | Cheers, | > | > | > | Simon | > | > | > | _______________________________________________ | > | > | > | ghc-devs mailing list | > | > | > | ghc-devs@haskell.org | > | > | > | http://www.haskell.org/mailman/listinfo/ghc-devs | > | > | > | > | > | >

Thanks. Reading what you write below, I can see two possible motivations. 1. Reduce stack sizes. 2. Eliminate memory moves For (1) do we have any data to show that the non-overlap of areas was giving rise to unacceptably big stacks? For (2) that is indeed clever, but it's pretty serendipitous: it relies on the overlap being just so, so that coincidentally y gets stored in the same place as it was loaded from. I imagine that you don't plan the stack layout to cause that to happen; it's just a coincidence. Do we have any data to show that the coincidence happens with any frequency? Also, as you note, we lose the opportunity for certain sorts of code motion, perhaps increasing register pressure a lot. So there is a downside too. You seldom do things without a very good reason, so I feel I must be missing something. Simon | -----Original Message----- | From: Simon Marlow [mailto:marlowsd@gmail.com] | Sent: 10 January 2014 17:00 | To: Simon Peyton Jones; Herbert Valerio Riedel | Cc: ghc-devs@haskell.org | Subject: Re: High-level Cmm code and stack allocation | | So stack areas are still a great abstraction, the only change is that | they now overlap. It's not just about stack getting too big, I've | copied the notes I made about it below (which I will paste into the code | in due course). The nice property that we can generate well-defined Cmm | without knowing explicit stack offsets is intact. | | What is different is that there used to be an intermediate state where | live variables were saved to abstract stack areas across calls, but Sp | was still not manifest. This intermediate state doesn't exist any more, | the stack layout algorithm does it all in one pass. To me this was far | simpler, and I think it ended up being fewer lines of code than the old | multi-phase stack layout algorithm (it's also much faster). | | Of course you can always change this. My goal was to get code that was | at least as good as the old code generator and in a reasonable amount of | time, and this was the shortest path I could find to that goal. | | Cheers, | Simon | | e.g. if we had | | x = Sp[old + 8] | y = Sp[old + 16] | | Sp[young(L) + 8] = L | Sp[young(L) + 16] = y | Sp[young(L) + 24] = x | call f() returns to L | | if areas semantically do not overlap, then we might optimise this to | | Sp[young(L) + 8] = L | Sp[young(L) + 16] = Sp[old + 8] | Sp[young(L) + 24] = Sp[old + 16] | call f() returns to L | | and now young(L) cannot be allocated at the same place as old, and we | are doomed to use more stack. | | - old+8 conflicts with young(L)+8 | - old+16 conflicts with young(L)+16 and young(L)+8 | | so young(L)+8 == old+24 and we get | | Sp[-8] = L | Sp[-16] = Sp[8] | Sp[-24] = Sp[0] | Sp -= 24 | call f() returns to L | | However, if areas are defined to be "possibly overlapping" in the | semantics, then we cannot commute any loads/stores of old with young(L), | and we will be able to re-use both old+8 and old+16 for young(L). | | x = Sp[8] | y = Sp[0] | | Sp[8] = L | Sp[0] = y | Sp[-8] = x | Sp = Sp - 8 | call f() returns to L | | Now, the assignments of y go away, | | x = Sp[8] | Sp[8] = L | Sp[-8] = x | Sp = Sp - 8 | call f() returns to L | | | Conclusion: | | - T[old+N] aliases with U[young(L)+M] for all T, U, L, N and M | - T[old+N] aliases with U[old+M] only if the areas actually overlap | | this ensures that we will not commute any accesses to old with | young(L) or young(L) with young(L'), and the stack allocator will get | the maximum opportunity to overlap these areas, keeping the stack use to | a minimum and possibly avoiding some assignments. | | | | On 10/01/2014 16:35, Simon Peyton Jones wrote: | > Oh, ok. Alas, a good chunk of my model of Cmm has just gone out of | the window. I thought that areas were such a lovely, well-behaved | abstraction. I was thrilled when we came up with them, and I'm very | sorry to see them go. | > | > There are no many things that I no longer understand. I now have no | idea how we save live variables over a call, or how multiple returned | values from one call (returned on the stack) stay right where they are | if they are live across the next call. | > | > What was the actual problem? That functions used too much stack, so | the stack was getting too big? But a one slot area corresponds exactly | to a live variable, so I don't see how the area abstraction could | possibly increase stack size. And is stack size a crucial issue anyway? | > | > Apart from anything else, areas would have given a lovely solution to | the problem this thread started with! | > | > I guess we can talk about this when you next visit? But some | documentation would be welcome. | > | > Simon | > | > | -----Original Message----- | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | Sent: 10 January 2014 16:24 | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | Cc: ghc-devs@haskell.org | > | Subject: Re: High-level Cmm code and stack allocation | > | | > | There are no one-slot areas any more, I ditched those when I rewrote | > | the stack allocator. There is only ever one live area: either the | > | old area or the young area for a call we are about to make or have | just made. | > | (see the data type: I removed the one-slot areas) | > | | > | I struggled for a long time with this. The problem is that with the | > | semantics of non-overlapping areas, code motion optimisations would | > | tend to increase the stack requirements of the function by | > | overlapping the live ranges of the areas. I concluded that actually | > | what we wanted was areas that really do overlap, and optimisations | > | that respect that, so that we get more efficient stack usage. | > | | > | Cheers, | > | Simon | > | | > | On 10/01/2014 15:22, Simon Peyton Jones wrote: | > | > That documentation would be good, yes! I don't know what it means | > | > to | > | say "we don't really have a general concept of areas any more". We | > | did before, and I didn't know that it had gone away. Urk! We can | > | have lots of live areas, notably the old area (for the current | > | call/return parameters, the call area for a call we are preparing, | > | and the one-slot areas for variables we are saving on the stack. | > | > | > | > Here's he current story | > | > https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAre | > | > as | > | > | > | > I agree that we have no concrete syntax for talking about areas, | > | > but | > | that is something we could fix. But I'm worried that they may not | > | mean what they used to mean. | > | > | > | > Simon | > | > | > | > | -----Original Message----- | > | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | > | Sent: 09 January 2014 08:39 | > | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | > | Cc: ghc-devs@haskell.org | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | | > | > | On 08/01/2014 10:07, Simon Peyton Jones wrote: | > | > | > | > Can't we just allocate a Cmm "area"? The address of an | > | > | > | > area is a | > | > | > | perfectly well-defined Cmm value. | > | > | > | > | > | > What about this idea? | > | > | | > | > | We don't really have a general concept of areas (any more), and | > | > | areas aren't exposed in the concrete Cmm syntax at all. The | > | > | current semantics is that areas may overlap with each other, so | > | > | there should only be one active area at any point. I found that | > | > | this was important to ensure that we could generate good code | > | > | from the stack layout algorithm, otherwise it had to make | > | > | pessimistic assumptions | > | and use too much stack. | > | > | | > | > | You're going to ask me where this is documented, and I think I | > | > | have to admit to slacking off, sorry :-) We did discuss it at | > | > | the time, and I made copious notes, but I didn't transfer those | to the code. | > | > | I'll add a Note. | > | > | | > | > | Cheers, | > | > | Simon | > | > | | > | > | | > | > | > Simon | > | > | > | > | > | > | -----Original Message----- | > | > | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | > | > | Sent: 08 January 2014 09:26 | > | > | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | > | > | Cc: ghc-devs@haskell.org | > | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | > | | > | > | > | On 07/01/14 22:53, Simon Peyton Jones wrote: | > | > | > | > | Yes, this is technically wrong but luckily works. I'd | > | > | > | > | very much like to have a better solution, preferably one | > | > | > | > | that doesn't add any extra overhead. | > | > | > | > | > | > | > | > | __decodeFloat_Int is a C function, so it will not touch | > | > | > | > | the Haskell stack. | > | > | > | > | > | > | > | > This all seems terribly fragile to me. At least it ought | > | > | > | > to be | > | > | > | surrounded with massive comments pointing out how terribly | > | > | > | fragile it is, breaking all the rules that we carefully | > | > | > | document | > | elsewhere. | > | > | > | > | > | > | > | > Can't we just allocate a Cmm "area"? The address of an | > | > | > | > area is a | > | > | > | perfectly well-defined Cmm value. | > | > | > | | > | > | > | It is fragile, yes. We can't use static memory because it | > | > | > | needs to be thread-local. This particular hack has gone | > | > | > | through several iterations over the years: first we had | > | > | > | static memory, which broke when we did the parallel runtime, | > | > | > | then we had special storage in the Capability, which we gave | > | > | > | up when GMP was split out into a separate library, because | > | > | > | it didn't seem right to have magic fields in the Capability | for one library. | > | > | > | | > | > | > | I'm looking into whether we can do temporary allocation on | > | > | > | the heap for this instead. | > | > | > | | > | > | > | Cheers, | > | > | > | Simon | > | > | > | | > | > | > | | > | > | > | > Simon | > | > | > | > | > | > | > | > | -----Original Message----- | > | > | > | > | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On | > | > | > | > | Behalf Of Simon Marlow | > | > | > | > | Sent: 07 January 2014 16:05 | > | > | > | > | To: Herbert Valerio Riedel; ghc-devs@haskell.org | > | > | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | > | > | | > | > | > | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote: | > | > | > | > | > Hello, | > | > | > | > | > | > | > | > | > | > According to Note [Syntax of .cmm files], | > | > | > | > | > | > | > | > | > | > | There are two ways to write .cmm code: | > | > | > | > | > | | > | > | > | > | > | (1) High-level Cmm code delegates the stack | > | > | > | > | > | handling to GHC, | > | > | > | and | > | > | > | > | > | never explicitly mentions Sp or registers. | > | > | > | > | > | | > | > | > | > | > | (2) Low-level Cmm manages the stack itself, and | > | > | > | > | > | must know | > | > | about | > | > | > | > | > | calling conventions. | > | > | > | > | > | | > | > | > | > | > | Whether you want high-level or low-level Cmm is | > | > | > | > | > | indicated by the presence of an argument list on a | > | procedure. | > | > | > | > | > | > | > | > | > | > However, while working on integer-gmp I've been | > | > | > | > | > noticing in integer-gmp/cbits/gmp-wrappers.cmm that | > | > | > | > | > even though all Cmm | > | > | > | > | procedures | > | > | > | > | > have been converted to high-level Cmm, they still | > | > | > | > | > reference the | > | > | > | 'Sp' | > | > | > | > | > register, e.g. | > | > | > | > | > | > | > | > | > | > | > | > | > | > | > #define GMP_TAKE1_RET1(name,mp_fun) \ | > | > | > | > | > name (W_ ws1, P_ d1) \ | > | > | > | > | > { \ | > | > | > | > | > W_ mp_tmp1; \ | > | > | > | > | > W_ mp_result1; \ | > | > | > | > | > \ | > | > | > | > | > again: \ | > | > | > | > | > STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \ | > | > | > | > | > MAYBE_GC(again); \ | > | > | > | > | > \ | > | > | > | > | > mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \ | > | > | > | > | > mp_result1 = Sp - 2 * SIZEOF_MP_INT; \ | > | > | > | > | > ... \ | > | > | > | > | > | > | > | > | > | > | > | > | > | > | > So is this valid high-level Cmm code? What's the | > | > | > | > | > proper way to | > | > | > | > | allocate | > | > | > | > | > Stack (and/or Heap) memory from high-level Cmm code? | > | > | > | > | | > | > | > | > | Yes, this is technically wrong but luckily works. I'd | > | > | > | > | very much like to have a better solution, preferably one | > | > | > | > | that doesn't add any extra overhead. | > | > | > | > | | > | > | > | > | The problem here is that we need to allocate a couple of | > | > | > | > | temporary words and take their address; that's an | > | > | > | > | unusual thing to do in Cmm, so it only occurs in a few | > | > | > | > | places (mainly | > | > | interacting with gmp). | > | > | > | > | Usually if you want some temporary storage you can use | > | > | > | > | local variables or some heap-allocated memory. | > | > | > | > | | > | > | > | > | Cheers, | > | > | > | > | Simon | > | > | > | > | _______________________________________________ | > | > | > | > | ghc-devs mailing list | > | > | > | > | ghc-devs@haskell.org | > | > | > | > | http://www.haskell.org/mailman/listinfo/ghc-devs | > | > | > | > | > | > | > | > | > | >

Using more stack generally (but not always) implies extra memory traffic. I noticed it happening a lot, but I didn't make measurements - we never had a way to generate code with just this one thing changed, because the new code generator had lots of issues with bad code, and this was just one. We can think of stack allocation as a black box: it takes Cmm in which (a) variables live across calls and (b) stack references are to [Old+n] or [Young+n], and returns Cmm in which (a) variables do not live across calls, and (b) all stack references are explicit offsets from Sp. The internals of this box are what has changed. Most users of Cmm don't need to care, because you can write optimisations on both the pre-stack-allocated Cmm and the post-stack-allocated Cmm without knowing anything about how stack allocation works. Indeed CmmSink (now) works on both forms. The stack area idea exposed some of the internals of this box; I don't think that's necessarily a good thing. There was *another* form of Cmm, in which (a) variables do not live across calls, and (b) stack references are to [Old+n], [Young+n] or [Sp(var)]. There was a (beautifully simple) spill pass using Hoopl that inserted spills at the definition site; unfortunately to generate good code you often have to move the spills somewhere else. And that's really hard, because code motion interacts in complex ways with stack layout: making a bad code motion decision can increase your stack requirements. This is a pretty good summary of what I was finding difficult here. It was not possible to generate good code without doing some optimisation on this intermediate stage, yet by doing stack allocation in a different way it was much easier to get good code. So the new stack allocator just walks through the code spilling, reloading, and allocating stack frames as it goes and making intelligent decisions about not spilling things if they're already on the stack. This does a really good job, and it was easy to add a couple of important special cases for common things. There's plenty of room to do something better. However, what we have now generates good code from the kind of things that the code generator generates (since that's what I tuned it for, by peering at lots of Cmm and tweaking things), so any improvements won't see much benefit for typical Haskell code. I have some more docs for the stack layout code that I'll push shortly. Cheers, Simon On 13/01/2014 15:20, Simon Peyton Jones wrote:
Thanks. Reading what you write below, I can see two possible motivations.
1. Reduce stack sizes. 2. Eliminate memory moves
For (1) do we have any data to show that the non-overlap of areas was giving rise to unacceptably big stacks?
For (2) that is indeed clever, but it's pretty serendipitous: it relies on the overlap being just so, so that coincidentally y gets stored in the same place as it was loaded from. I imagine that you don't plan the stack layout to cause that to happen; it's just a coincidence. Do we have any data to show that the coincidence happens with any frequency?
Also, as you note, we lose the opportunity for certain sorts of code motion, perhaps increasing register pressure a lot. So there is a downside too.
You seldom do things without a very good reason, so I feel I must be missing something.
Simon
| -----Original Message----- | From: Simon Marlow [mailto:marlowsd@gmail.com] | Sent: 10 January 2014 17:00 | To: Simon Peyton Jones; Herbert Valerio Riedel | Cc: ghc-devs@haskell.org | Subject: Re: High-level Cmm code and stack allocation | | So stack areas are still a great abstraction, the only change is that | they now overlap. It's not just about stack getting too big, I've | copied the notes I made about it below (which I will paste into the code | in due course). The nice property that we can generate well-defined Cmm | without knowing explicit stack offsets is intact. | | What is different is that there used to be an intermediate state where | live variables were saved to abstract stack areas across calls, but Sp | was still not manifest. This intermediate state doesn't exist any more, | the stack layout algorithm does it all in one pass. To me this was far | simpler, and I think it ended up being fewer lines of code than the old | multi-phase stack layout algorithm (it's also much faster). | | Of course you can always change this. My goal was to get code that was | at least as good as the old code generator and in a reasonable amount of | time, and this was the shortest path I could find to that goal. | | Cheers, | Simon | | e.g. if we had | | x = Sp[old + 8] | y = Sp[old + 16] | | Sp[young(L) + 8] = L | Sp[young(L) + 16] = y | Sp[young(L) + 24] = x | call f() returns to L | | if areas semantically do not overlap, then we might optimise this to | | Sp[young(L) + 8] = L | Sp[young(L) + 16] = Sp[old + 8] | Sp[young(L) + 24] = Sp[old + 16] | call f() returns to L | | and now young(L) cannot be allocated at the same place as old, and we | are doomed to use more stack. | | - old+8 conflicts with young(L)+8 | - old+16 conflicts with young(L)+16 and young(L)+8 | | so young(L)+8 == old+24 and we get | | Sp[-8] = L | Sp[-16] = Sp[8] | Sp[-24] = Sp[0] | Sp -= 24 | call f() returns to L | | However, if areas are defined to be "possibly overlapping" in the | semantics, then we cannot commute any loads/stores of old with young(L), | and we will be able to re-use both old+8 and old+16 for young(L). | | x = Sp[8] | y = Sp[0] | | Sp[8] = L | Sp[0] = y | Sp[-8] = x | Sp = Sp - 8 | call f() returns to L | | Now, the assignments of y go away, | | x = Sp[8] | Sp[8] = L | Sp[-8] = x | Sp = Sp - 8 | call f() returns to L | | | Conclusion: | | - T[old+N] aliases with U[young(L)+M] for all T, U, L, N and M | - T[old+N] aliases with U[old+M] only if the areas actually overlap | | this ensures that we will not commute any accesses to old with | young(L) or young(L) with young(L'), and the stack allocator will get | the maximum opportunity to overlap these areas, keeping the stack use to | a minimum and possibly avoiding some assignments. | | | | On 10/01/2014 16:35, Simon Peyton Jones wrote: | > Oh, ok. Alas, a good chunk of my model of Cmm has just gone out of | the window. I thought that areas were such a lovely, well-behaved | abstraction. I was thrilled when we came up with them, and I'm very | sorry to see them go. | > | > There are no many things that I no longer understand. I now have no | idea how we save live variables over a call, or how multiple returned | values from one call (returned on the stack) stay right where they are | if they are live across the next call. | > | > What was the actual problem? That functions used too much stack, so | the stack was getting too big? But a one slot area corresponds exactly | to a live variable, so I don't see how the area abstraction could | possibly increase stack size. And is stack size a crucial issue anyway? | > | > Apart from anything else, areas would have given a lovely solution to | the problem this thread started with! | > | > I guess we can talk about this when you next visit? But some | documentation would be welcome. | > | > Simon | > | > | -----Original Message----- | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | Sent: 10 January 2014 16:24 | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | Cc: ghc-devs@haskell.org | > | Subject: Re: High-level Cmm code and stack allocation | > | | > | There are no one-slot areas any more, I ditched those when I rewrote | > | the stack allocator. There is only ever one live area: either the | > | old area or the young area for a call we are about to make or have | just made. | > | (see the data type: I removed the one-slot areas) | > | | > | I struggled for a long time with this. The problem is that with the | > | semantics of non-overlapping areas, code motion optimisations would | > | tend to increase the stack requirements of the function by | > | overlapping the live ranges of the areas. I concluded that actually | > | what we wanted was areas that really do overlap, and optimisations | > | that respect that, so that we get more efficient stack usage. | > | | > | Cheers, | > | Simon | > | | > | On 10/01/2014 15:22, Simon Peyton Jones wrote: | > | > That documentation would be good, yes! I don't know what it means | > | > to | > | say "we don't really have a general concept of areas any more". We | > | did before, and I didn't know that it had gone away. Urk! We can | > | have lots of live areas, notably the old area (for the current | > | call/return parameters, the call area for a call we are preparing, | > | and the one-slot areas for variables we are saving on the stack. | > | > | > | > Here's he current story | > | > https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAre | > | > as | > | > | > | > I agree that we have no concrete syntax for talking about areas, | > | > but | > | that is something we could fix. But I'm worried that they may not | > | mean what they used to mean. | > | > | > | > Simon | > | > | > | > | -----Original Message----- | > | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | > | Sent: 09 January 2014 08:39 | > | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | > | Cc: ghc-devs@haskell.org | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | | > | > | On 08/01/2014 10:07, Simon Peyton Jones wrote: | > | > | > | > Can't we just allocate a Cmm "area"? The address of an | > | > | > | > area is a | > | > | > | perfectly well-defined Cmm value. | > | > | > | > | > | > What about this idea? | > | > | | > | > | We don't really have a general concept of areas (any more), and | > | > | areas aren't exposed in the concrete Cmm syntax at all. The | > | > | current semantics is that areas may overlap with each other, so | > | > | there should only be one active area at any point. I found that | > | > | this was important to ensure that we could generate good code | > | > | from the stack layout algorithm, otherwise it had to make | > | > | pessimistic assumptions | > | and use too much stack. | > | > | | > | > | You're going to ask me where this is documented, and I think I | > | > | have to admit to slacking off, sorry :-) We did discuss it at | > | > | the time, and I made copious notes, but I didn't transfer those | to the code. | > | > | I'll add a Note. | > | > | | > | > | Cheers, | > | > | Simon | > | > | | > | > | | > | > | > Simon | > | > | > | > | > | > | -----Original Message----- | > | > | > | From: Simon Marlow [mailto:marlowsd@gmail.com] | > | > | > | Sent: 08 January 2014 09:26 | > | > | > | To: Simon Peyton Jones; Herbert Valerio Riedel | > | > | > | Cc: ghc-devs@haskell.org | > | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | > | | > | > | > | On 07/01/14 22:53, Simon Peyton Jones wrote: | > | > | > | > | Yes, this is technically wrong but luckily works. I'd | > | > | > | > | very much like to have a better solution, preferably one | > | > | > | > | that doesn't add any extra overhead. | > | > | > | > | > | > | > | > | __decodeFloat_Int is a C function, so it will not touch | > | > | > | > | the Haskell stack. | > | > | > | > | > | > | > | > This all seems terribly fragile to me. At least it ought | > | > | > | > to be | > | > | > | surrounded with massive comments pointing out how terribly | > | > | > | fragile it is, breaking all the rules that we carefully | > | > | > | document | > | elsewhere. | > | > | > | > | > | > | > | > Can't we just allocate a Cmm "area"? The address of an | > | > | > | > area is a | > | > | > | perfectly well-defined Cmm value. | > | > | > | | > | > | > | It is fragile, yes. We can't use static memory because it | > | > | > | needs to be thread-local. This particular hack has gone | > | > | > | through several iterations over the years: first we had | > | > | > | static memory, which broke when we did the parallel runtime, | > | > | > | then we had special storage in the Capability, which we gave | > | > | > | up when GMP was split out into a separate library, because | > | > | > | it didn't seem right to have magic fields in the Capability | for one library. | > | > | > | | > | > | > | I'm looking into whether we can do temporary allocation on | > | > | > | the heap for this instead. | > | > | > | | > | > | > | Cheers, | > | > | > | Simon | > | > | > | | > | > | > | | > | > | > | > Simon | > | > | > | > | > | > | > | > | -----Original Message----- | > | > | > | > | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On | > | > | > | > | Behalf Of Simon Marlow | > | > | > | > | Sent: 07 January 2014 16:05 | > | > | > | > | To: Herbert Valerio Riedel; ghc-devs@haskell.org | > | > | > | > | Subject: Re: High-level Cmm code and stack allocation | > | > | > | > | | > | > | > | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote: | > | > | > | > | > Hello, | > | > | > | > | > | > | > | > | > | > According to Note [Syntax of .cmm files], | > | > | > | > | > | > | > | > | > | > | There are two ways to write .cmm code: | > | > | > | > | > | | > | > | > | > | > | (1) High-level Cmm code delegates the stack | > | > | > | > | > | handling to GHC, | > | > | > | and | > | > | > | > | > | never explicitly mentions Sp or registers. | > | > | > | > | > | | > | > | > | > | > | (2) Low-level Cmm manages the stack itself, and | > | > | > | > | > | must know | > | > | about | > | > | > | > | > | calling conventions. | > | > | > | > | > | | > | > | > | > | > | Whether you want high-level or low-level Cmm is | > | > | > | > | > | indicated by the presence of an argument list on a | > | procedure. | > | > | > | > | > | > | > | > | > | > However, while working on integer-gmp I've been | > | > | > | > | > noticing in integer-gmp/cbits/gmp-wrappers.cmm that | > | > | > | > | > even though all Cmm | > | > | > | > | procedures | > | > | > | > | > have been converted to high-level Cmm, they still | > | > | > | > | > reference the | > | > | > | 'Sp' | > | > | > | > | > register, e.g. | > | > | > | > | > | > | > | > | > | > | > | > | > | > | > #define GMP_TAKE1_RET1(name,mp_fun) \ | > | > | > | > | > name (W_ ws1, P_ d1) \ | > | > | > | > | > { \ | > | > | > | > | > W_ mp_tmp1; \ | > | > | > | > | > W_ mp_result1; \ | > | > | > | > | > \ | > | > | > | > | > again: \ | > | > | > | > | > STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \ | > | > | > | > | > MAYBE_GC(again); \ | > | > | > | > | > \ | > | > | > | > | > mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \ | > | > | > | > | > mp_result1 = Sp - 2 * SIZEOF_MP_INT; \ | > | > | > | > | > ... \ | > | > | > | > | > | > | > | > | > | > | > | > | > | > | > So is this valid high-level Cmm code? What's the | > | > | > | > | > proper way to | > | > | > | > | allocate | > | > | > | > | > Stack (and/or Heap) memory from high-level Cmm code? | > | > | > | > | | > | > | > | > | Yes, this is technically wrong but luckily works. I'd | > | > | > | > | very much like to have a better solution, preferably one | > | > | > | > | that doesn't add any extra overhead. | > | > | > | > | | > | > | > | > | The problem here is that we need to allocate a couple of | > | > | > | > | temporary words and take their address; that's an | > | > | > | > | unusual thing to do in Cmm, so it only occurs in a few | > | > | > | > | places (mainly | > | > | interacting with gmp). | > | > | > | > | Usually if you want some temporary storage you can use | > | > | > | > | local variables or some heap-allocated memory. | > | > | > | > | | > | > | > | > | Cheers, | > | > | > | > | Simon | > | > | > | > | _______________________________________________ | > | > | > | > | ghc-devs mailing list | > | > | > | > | ghc-devs@haskell.org | > | > | > | > | http://www.haskell.org/mailman/listinfo/ghc-devs | > | > | > | > | > | > | > | > | > | >
participants (5)
-
Carter Schonwald
-
Herbert Valerio Riedel
-
Herbert Valerio Riedel
-
Simon Marlow
-
Simon Peyton Jones