Re: Questions concerning the LLVM backend - i.e. 'proc point splitting'

Simon
| Could you explain the need of further info-tables for 'inner' | proc-points (those, which are not the entry-block of a function) to | me, please?
The main purpose of inner info tables is this. Given (case e of blah) we push a return address (for 'blah') on the stack and evaluate 'e'. During evaluation of 'e', garbage collection may take place. So the garbage collector needs to walk the stack to follow live pointers. Who knows what pointers are live? Answer: the code for 'blah' clearly knows what pointers are alive in the current stack frame, because it is going to access them. So the return address gets an info table that makes that implicit knowledge explicit, as a bitmap describing the liveness/pointer-hood of the stack frame.
Addresses that are only jumped to don’t need info tables.
It would be great if, as you learn about this stuff, you could add to the Commentary https://ghc.haskell.org/trac/ghc/wiki/Commentary to capture what you have learned.
Thank you very much! In cases I find the current description imprecise or incomplete I can add what I've learned to the Commentary of course.
| The documentation states, that proc-point splitting is only required | by the LLVM backend (see [2]). Does the cmm-pipeline perform proc- | point splitting even when using the native code generator?
I'm not certain, but I don't think so.
| As stated above, the major problem of LLVM seems to be inferring | control flow when it comes to call proc-points. As it's hard to avoid | proc-point splitting, we leave the cmm-pipeline 'as-is'. We modify the
I regard the whole proc-point thing as a major wart. I think we should work hard to eliminate it. So: another line of thought might be: how could we influence LLVM so that we didn't need proc-point splitting?
I totally agree with you here. However, determining how LLVM needs to be influenced to eliminate proc-point splitting is a thing I won't be capable of doing mid term because of my current knowledge and lack of time.
| To make control flow more apparent to LLVM, they came up with the | following idea. Remark before showing the idea: I know that it greatly
I'm sorry to say that I didn't really understand the idea. Maybe someone else did? David Terei?
My apologies for being imprecise. I hope to find the time to work out a little demonstration during the upcoming holidays. I'm currently fully occupied by university because of upcoming exams this week.
If you are seriously about investing effort in the back end, I'd be happy to help; in that case a Skype call is probably the best way.
That is a great idea. Due to the above mentioned time constraints I propose that we follow up in January. Regards, David
Simon
| interferes with CPS. It leaves parts of the continuation-handling to | the implicit LLVM call-stack. Further, this does not solve the problem | of proc-point splitting at all. It is more a workaround than a | solution. | But it would greatly increase LLVM's possibilities to optimize code | later on. That's why I found this idea worth mentioning. Especially | because after reading your answer, Ben, I'm somewhat convinced that | solving the problem of proc-point splitting is nearly impossible with | the capabilities of LLVM IR available at the moment. Now back to the | idea. | | As stated above, the major problem of LLVM seems to be inferring | control flow when it comes to call proc-points. As it's hard to avoid | proc-point splitting, we leave the cmm-pipeline 'as-is'. We modify the | code, llvmGen produces when it comes to call proc-points instead. All | other types of proc-points remain unchanged. | | Initial situation in pseudo cmm: | | foo() { | ... | call bar() returns to cont // bar defined externally | | cont: | ... | } | | After proc-point splitting, this is converted into LLVM IR similar to | the following: | | @foo() { | ... | store @cont(), %Sp ; push continuation on the stack tail call @bar() ; | @bar defined externally } | | @cont() { | ... | } | | To fix the above issue, we introduce a method, which restores the | pointers Sp, Hp and the register R1 (among everything else, what has | to be restored) and 'fakes' the continuation. Note, that '@restore' | returns without calling into the continuation. This way, the call into | the continuation can be made direct. The above code would be | transformed into something similar to the following: | | @foo() { | ... | store @restore(), %Sp ; 'fake' continuation call @bar() ; br label | restore | | restore: | %Sp = load (getelementptr @environment 0, 0) ; restore what has to be | ... ; restored tail call @cont() } | | @cont() { | ... | } | | @restore(%Sp, %Hp, %R1, ...) { | store %Sp, (getelementptr @environment 0, 0) ; store everything into | ... ; global variable ret void } | | @environment ; structure of type similar to {i64*, i64*, i64*, i64, | i64, ..} | | As an alternative, the above transformation could be done by a LLVM | pass. llvmGen could then annotate the continuation of calls call cite. | This would greatly simplify the detection of the correct continuation. | | What do you think about this idea? Especially when thinking about the | interference with CPS? | | Regards, | | David | | | | | [1] | https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/GeneratedCod | e#Importantconceptsinthemachine | | [2] | https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/CodeGen#Seco | ndstage:theCmmpipeline | | | | | On 11/21/2015 12:00 AM, Ben Gamari wrote: | > Simon Peyton Jones
writes: | > | >> David | >> | >> All this stuff is terribly paged-out for me. But I'd love someone | to | >> pay attention to some of this back-end stuff, so if you'd like to | >> work there, I'd like to help you. | >> | >> David Terei was also thinking of getting rid of proc point | splitting | >> for LLVM (see attached email and the notes attached to it) | >> | >> Simon Marlow also was thinking about this; see his attached email. | >> | >> The main *reason* for establishing a proc-point is so that, when | >> generating C code (which we don't do any more) you can take its | >> address. E.g. | >> | >> foo() { ... Push &bar onto the stack (effectively a return address) | >> Jump to thingumpy } | >> | >> bar() { ... } | >> | >> Really bar is part of foo, but we have make it a separate top level | >> thing so that we can take the address of bar and push it on the | stack. | >> | >> The trouble is that if there is a label inside bar that foo wants | to | >> jump to (i.e. without pushing &bar etc) then we have to make that | >> label too into a proc-point, so that both foo and bar can see it. | >> Urgh. | >> | >> In LLVM this probably is not true; presumably you can take the | >> address of any label? | >> | > This is true. While labels themselves have function-local scope in | > LLVM, there is an expression, `blockaddress`, which allows you to | take | > an address to a label in another function [1]. That being said, in | > reading through the documentation it's not at all clear to me that | it | > would be safe to jump to such an address. In fact, given that the | > instruction that this expression is supposed to be used with, | > `indirectbr`, can only be used for local blocks, I suspect it is | not. | > More information about this feature can be found here [2]. | > | > The jump issue aside, I don't know how you would deal with | > tables-next-to-code. The prefix data support that currently | available | > in LLVM is attached to functions and I unfortunately don't see that | > changing any time soon. | > | > Ultimately it seems that trying to refer to labels defined in other | > functions is using LLVM against the way it was intended. One | > alternative would be to teach llvmGen to merge mutually recusive | > top-level functions into a single LLVM function during code | > generation. Otherwise I'm afraid you are stuck with either the | status | > quo or attempting to improve on LLVM's own cross-procedure | optimization ability. | > | > That being said, it sounds as though eliminating proc-point | splitting | > would make for quite a nice project in the native code generator. | > | > Cheers, | > | > - Ben | > | > | > [1] | > | https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fllvm.o | > rg%2fdocs%2fLangRef.html%23addresses-of-basic- | blocks&data=01%7c01%7csi | > | monpj%40064d.mgd.microsoft.com%7c7fdbfda0b4514a098c4d08d2f74448f1%7c72 | > | f988bf86f141af91ab2d7cd011db47%7c1&sdata=nKVLgmZezUdEYL%2fma1exMMPLbcT | > RkaE1JrvVaR1EesY%3d [2] | > | https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fblog.l | > lvm.org%2f2010%2f01%2faddress-of-label-and-indirect- | branches.html&data | > | =01%7c01%7csimonpj%40064d.mgd.microsoft.com%7c7fdbfda0b4514a098c4d08d2 | > | f74448f1%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=BTMdZESwYS4tZa5W | > sOAqyFutoI5xNNFDe0b%2bdKC3ODs%3d [3] | > | https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fllvm.o | > rg%2fdocs%2fLangRef.html%23prefix- | data&data=01%7c01%7csimonpj%40064d.m | > | gd.microsoft.com%7c7fdbfda0b4514a098c4d08d2f74448f1%7c72f988bf86f141af | > | 91ab2d7cd011db47%7c1&sdata=rdo4ZLjjE%2bWxNykVL%2bSuToWioY6nzQpflSk296E | > 8W70%3d | >

Hello everyone! First of all, my apologies for letting you wait that long. Especially, I want to let you know that I really feel sorry for not following up your offer back in January, Simon. I somewhat underestimated the amount of time I had to invest in university during the last semester. Unfortunately, I couldn't find the time to work out the demonstration I mentioned in my last mail back in December. At the moment, I'm making up for this showcase. In doing so, I stumbled across another question concerning the cmm-pipeline: Given a CmmGraph, is there a possibility to annotate information to a single node within this graph? I.e. I want to annotate to certain CmmCalls that they where introduced by 'proc point splitting'. I would like to slightly modifiy the generation of LLVM IR for such Calls later on. Thanks for your help in advance! Regards, David On 12/14/2015 06:16 PM, David Spitzenberg wrote:
Simon
| Could you explain the need of further info-tables for 'inner' | proc-points (those, which are not the entry-block of a function) to | me, please?
The main purpose of inner info tables is this. Given (case e of blah) we push a return address (for 'blah') on the stack and evaluate 'e'. During evaluation of 'e', garbage collection may take place. So the garbage collector needs to walk the stack to follow live pointers. Who knows what pointers are live? Answer: the code for 'blah' clearly knows what pointers are alive in the current stack frame, because it is going to access them. So the return address gets an info table that makes that implicit knowledge explicit, as a bitmap describing the liveness/pointer-hood of the stack frame.
Addresses that are only jumped to don’t need info tables.
It would be great if, as you learn about this stuff, you could add to the Commentary https://ghc.haskell.org/trac/ghc/wiki/Commentary to capture what you have learned.
Thank you very much! In cases I find the current description imprecise or incomplete I can add what I've learned to the Commentary of course.
| The documentation states, that proc-point splitting is only required | by the LLVM backend (see [2]). Does the cmm-pipeline perform proc- | point splitting even when using the native code generator?
I'm not certain, but I don't think so.
| As stated above, the major problem of LLVM seems to be inferring | control flow when it comes to call proc-points. As it's hard to avoid | proc-point splitting, we leave the cmm-pipeline 'as-is'. We modify the
I regard the whole proc-point thing as a major wart. I think we should work hard to eliminate it. So: another line of thought might be: how could we influence LLVM so that we didn't need proc-point splitting?
I totally agree with you here. However, determining how LLVM needs to be influenced to eliminate proc-point splitting is a thing I won't be capable of doing mid term because of my current knowledge and lack of time.
| To make control flow more apparent to LLVM, they came up with the | following idea. Remark before showing the idea: I know that it greatly
I'm sorry to say that I didn't really understand the idea. Maybe someone else did? David Terei?
My apologies for being imprecise. I hope to find the time to work out a little demonstration during the upcoming holidays. I'm currently fully occupied by university because of upcoming exams this week.
If you are seriously about investing effort in the back end, I'd be happy to help; in that case a Skype call is probably the best way.
That is a great idea. Due to the above mentioned time constraints I propose that we follow up in January.
Regards,
David
Simon
| interferes with CPS. It leaves parts of the continuation-handling to | the implicit LLVM call-stack. Further, this does not solve the problem | of proc-point splitting at all. It is more a workaround than a | solution. | But it would greatly increase LLVM's possibilities to optimize code | later on. That's why I found this idea worth mentioning. Especially | because after reading your answer, Ben, I'm somewhat convinced that | solving the problem of proc-point splitting is nearly impossible with | the capabilities of LLVM IR available at the moment. Now back to the | idea. | | As stated above, the major problem of LLVM seems to be inferring | control flow when it comes to call proc-points. As it's hard to avoid | proc-point splitting, we leave the cmm-pipeline 'as-is'. We modify the | code, llvmGen produces when it comes to call proc-points instead. All | other types of proc-points remain unchanged. | | Initial situation in pseudo cmm: | | foo() { | ... | call bar() returns to cont // bar defined externally | | cont: | ... | } | | After proc-point splitting, this is converted into LLVM IR similar to | the following: | | @foo() { | ... | store @cont(), %Sp ; push continuation on the stack tail call @bar() ; | @bar defined externally } | | @cont() { | ... | } | | To fix the above issue, we introduce a method, which restores the | pointers Sp, Hp and the register R1 (among everything else, what has | to be restored) and 'fakes' the continuation. Note, that '@restore' | returns without calling into the continuation. This way, the call into | the continuation can be made direct. The above code would be | transformed into something similar to the following: | | @foo() { | ... | store @restore(), %Sp ; 'fake' continuation call @bar() ; br label | restore | | restore: | %Sp = load (getelementptr @environment 0, 0) ; restore what has to be | ... ; restored tail call @cont() } | | @cont() { | ... | } | | @restore(%Sp, %Hp, %R1, ...) { | store %Sp, (getelementptr @environment 0, 0) ; store everything into | ... ; global variable ret void } | | @environment ; structure of type similar to {i64*, i64*, i64*, i64, | i64, ..} | | As an alternative, the above transformation could be done by a LLVM | pass. llvmGen could then annotate the continuation of calls call cite. | This would greatly simplify the detection of the correct continuation. | | What do you think about this idea? Especially when thinking about the | interference with CPS? | | Regards, | | David | | | | | [1] | https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/GeneratedCod | e#Importantconceptsinthemachine | | [2] | https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/CodeGen#Seco | ndstage:theCmmpipeline | | | | | On 11/21/2015 12:00 AM, Ben Gamari wrote: | > Simon Peyton Jones
writes: | > | >> David | >> | >> All this stuff is terribly paged-out for me. But I'd love someone | to | >> pay attention to some of this back-end stuff, so if you'd like to | >> work there, I'd like to help you. | >> | >> David Terei was also thinking of getting rid of proc point | splitting | >> for LLVM (see attached email and the notes attached to it) | >> | >> Simon Marlow also was thinking about this; see his attached email. | >> | >> The main *reason* for establishing a proc-point is so that, when | >> generating C code (which we don't do any more) you can take its | >> address. E.g. | >> | >> foo() { ... Push &bar onto the stack (effectively a return address) | >> Jump to thingumpy } | >> | >> bar() { ... } | >> | >> Really bar is part of foo, but we have make it a separate top level | >> thing so that we can take the address of bar and push it on the | stack. | >> | >> The trouble is that if there is a label inside bar that foo wants | to | >> jump to (i.e. without pushing &bar etc) then we have to make that | >> label too into a proc-point, so that both foo and bar can see it. | >> Urgh. | >> | >> In LLVM this probably is not true; presumably you can take the | >> address of any label? | >> | > This is true. While labels themselves have function-local scope in | > LLVM, there is an expression, `blockaddress`, which allows you to | take | > an address to a label in another function [1]. That being said, in | > reading through the documentation it's not at all clear to me that | it | > would be safe to jump to such an address. In fact, given that the | > instruction that this expression is supposed to be used with, | > `indirectbr`, can only be used for local blocks, I suspect it is | not. | > More information about this feature can be found here [2]. | > | > The jump issue aside, I don't know how you would deal with | > tables-next-to-code. The prefix data support that currently | available | > in LLVM is attached to functions and I unfortunately don't see that | > changing any time soon. | > | > Ultimately it seems that trying to refer to labels defined in other | > functions is using LLVM against the way it was intended. One | > alternative would be to teach llvmGen to merge mutually recusive | > top-level functions into a single LLVM function during code | > generation. Otherwise I'm afraid you are stuck with either the | status | > quo or attempting to improve on LLVM's own cross-procedure | optimization ability. | > | > That being said, it sounds as though eliminating proc-point | splitting | > would make for quite a nice project in the native code generator. | > | > Cheers, | > | > - Ben | > | > | > [1] | > | https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fllvm.o | > rg%2fdocs%2fLangRef.html%23addresses-of-basic- | blocks&data=01%7c01%7csi | > | monpj%40064d.mgd.microsoft.com%7c7fdbfda0b4514a098c4d08d2f74448f1%7c72 | > | f988bf86f141af91ab2d7cd011db47%7c1&sdata=nKVLgmZezUdEYL%2fma1exMMPLbcT | > RkaE1JrvVaR1EesY%3d [2] | > | https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fblog.l | > lvm.org%2f2010%2f01%2faddress-of-label-and-indirect- | branches.html&data | > | =01%7c01%7csimonpj%40064d.mgd.microsoft.com%7c7fdbfda0b4514a098c4d08d2 | > | f74448f1%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=BTMdZESwYS4tZa5W | > sOAqyFutoI5xNNFDe0b%2bdKC3ODs%3d [3] | > | https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fllvm.o | > rg%2fdocs%2fLangRef.html%23prefix- | data&data=01%7c01%7csimonpj%40064d.m | > | gd.microsoft.com%7c7fdbfda0b4514a098c4d08d2f74448f1%7c72f988bf86f141af | > | 91ab2d7cd011db47%7c1&sdata=rdo4ZLjjE%2bWxNykVL%2bSuToWioY6nzQpflSk296E | > 8W70%3d | > _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

David Spitzenberg
Hello everyone!
First of all, my apologies for letting you wait that long. Especially, I want to let you know that I really feel sorry for not following up your offer back in January, Simon.
Don't worry at all; life happens. Any effort your are able to contribute is appreciated.
I somewhat underestimated the amount of time I had to invest in university during the last semester. Unfortunately, I couldn't find the time to work out the demonstration I mentioned in my last mail back in December.
At the moment, I'm making up for this showcase. In doing so, I stumbled across another question concerning the cmm-pipeline:
Given a CmmGraph, is there a possibility to annotate information to a single node within this graph? I.e. I want to annotate to certain CmmCalls that they where introduced by 'proc point splitting'. I would like to slightly modifiy the generation of LLVM IR for such Calls later on.
My guess here would be to map the CmmGraph (which is simply a type synonym for `GenCmmGraph CmmNode` to something of type `GenCmmGraph AnnCmmNode` where AnnCmmNode carries a CmmNode along with whatever other information you'd like to preserve. This then poses the question of what you'd like to *do* with this graph, since you'll be unable to use much of the GHC's existing machinery. My (possibly mistaken) impression is that we don't have a terribly great story in GHC for working with arbitrarily annotated Cmm graphs. I'm afraid we have now scraped past the bounds of my knowledge, however; hopefully one of the Simons will be able to set you off in the right direction. Cheers, - Ben

On 03/03/2016 11:32 AM, Ben Gamari wrote:
Given a CmmGraph, is there a possibility to annotate information to a single node within this graph? I.e. I want to annotate to certain CmmCalls that they where introduced by 'proc point splitting'. I would like to slightly modifiy the generation of LLVM IR for such Calls later on.
My guess here would be to map the CmmGraph (which is simply a type synonym for `GenCmmGraph CmmNode` to something of type `GenCmmGraph AnnCmmNode` where AnnCmmNode carries a CmmNode along with whatever other information you'd like to preserve.
This then poses the question of what you'd like to *do* with this graph, since you'll be unable to use much of the GHC's existing machinery. My (possibly mistaken) impression is that we don't have a terribly great story in GHC for working with arbitrarily annotated Cmm graphs.
Thank you very much, Ben. Basically, I just want the annotation to be propagated all the way down to the backend-code generation (llvmGen in this case). At the moment, I can't find any need to modify the map you suggest in between. So using the existing machinery shouldn't be necessary to do any work on the map itself. Propagating this newly introduced map to where I want to use it seems harder, though. I'm going to have a look at it and put some more questions to you, if I struggle to find a way to propagate the map. Regards, David

Maybe you could just maintain a separate map "on the side" and push it through to the LLVM code gen?
I hate the whole proc-point machinery. It's all caused by our inability to take the address of a local label. So to push a return address on the stack we have to make it a top-level "procedure" and that gives rise to all the proc-point pain. For the native code generator none of this is necessary (I think).
It's also annoyingly invasive. Somehow proc-points ought to be localised to the LLVM back end, not pervasive in the Cmm pipeline.
In short, if you get deep into this, do feel free to propose something better!
Simon
| -----Original Message-----
| From: David Spitzenberg [mailto:spitzenb@fim.uni-passau.de]
| Sent: 04 March 2016 08:40
| To: Ben Gamari
participants (3)
-
Ben Gamari
-
David Spitzenberg
-
Simon Peyton Jones