Dataflow analysis for Cmm

Hi, I was looking at cleaning up a bit the situation with dataflow analysis for Cmm. In particular, I was experimenting with rewriting the current `cmm.Hoopl.Dataflow` module: - To only include the functionality to do analysis (since GHC doesn’t seem to use the rewriting part). Benefits: - Code simplification (we could remove a lot of unused code). - Makes it clear what we’re actually using from Hoopl. - To have an interface that works with transfer functions operating on a whole basic block (`Block CmmNode C C`). This means that it would be up to the user of the algorithm to traverse the whole block. Benefits: - Further simplifications. - We could remove `analyzeFwdBlocks` hack, which AFAICS is just a copy&paste of `analyzeFwd` but ignores the middle nodes (probably for efficiency of analyses that only look at the blocks). - More flexible (e.g., the clients could know which block they’re processing; we could consider memoizing some per block information, etc.). What do you think about this? I have a branch that implements the above: https://github.com/michalt/ghc/tree/dataflow2/1 It’s introducing a second parallel implementation (`cmm.Hoopl.Dataflow2` module), so that it's possible to run ./validate while comparing the results of the old implementation with the new one. Second question: how could we merge this? (assuming that people are generally ok with the approach) Some ideas: - Change cmm/Hoopl/Dataflow module itself along with the three analyses that use it in one step. - Introduce the Dataflow2 module first, then switch the analyses, then remove any unused code that still depends on the old Dataflow module, finally remove the old Dataflow module itself. (Personally I'd prefer the second option, but I'm also ok with the first one) I’m happy to export the code to Phab if you prefer - I wasn’t sure what’s the recommended workflow for code that’s not ready for review… Thanks, Michal

Michał, Dataflow module could indeed use cleanup. I have made two attempts at this in the past but I don't think any of them was merged - see [1] and [2]. [2] was mostly type-directed simplifications. It would be nice to have this included in one form or another. It sounds like you also have a more in-depth refactoring in mind. Personally as long as it is semantically correct I think it will be a good thing. I would especially support removing dead code that we don't really use. [1] https://github.com/jstolarek/ghc/commits/js-hoopl-cleanup-v2 [2] https://github.com/jstolarek/ghc/commits/js-hoopl-cleanup-v2
Second question: how could we merge this? (...) I'm not sure if I understand. The end result after merging will be exactly the same, right? Are you asking for advice what is the best way of doing this from a technical point if view? I would simply edit the existing module. Introducing a temporary second module seems like unnecessary extra work and perhaps complicating the patch review.
I’m happy to export the code to Phab if you prefer - I wasn’t sure what’s the recommended workflow for code that’s not ready for review… This is OK but please remember to set status of revision to "Planned changes" after uploading it to Phab so it doesn't sit in reviewing queue.
Janek Dnia niedziela, 16 października 2016, Michal Terepeta napisał:
Hi,
I was looking at cleaning up a bit the situation with dataflow analysis for Cmm. In particular, I was experimenting with rewriting the current `cmm.Hoopl.Dataflow` module: - To only include the functionality to do analysis (since GHC doesn’t seem to use the rewriting part). Benefits: - Code simplification (we could remove a lot of unused code). - Makes it clear what we’re actually using from Hoopl. - To have an interface that works with transfer functions operating on a whole basic block (`Block CmmNode C C`). This means that it would be up to the user of the algorithm to traverse the whole block. Benefits: - Further simplifications. - We could remove `analyzeFwdBlocks` hack, which AFAICS is just a copy&paste of `analyzeFwd` but ignores the middle nodes (probably for efficiency of analyses that only look at the blocks). - More flexible (e.g., the clients could know which block they’re processing; we could consider memoizing some per block information, etc.).
What do you think about this?
I have a branch that implements the above: https://github.com/michalt/ghc/tree/dataflow2/1 It’s introducing a second parallel implementation (`cmm.Hoopl.Dataflow2` module), so that it's possible to run ./validate while comparing the results of the old implementation with the new one.
Second question: how could we merge this? (assuming that people are generally ok with the approach) Some ideas: - Change cmm/Hoopl/Dataflow module itself along with the three analyses that use it in one step. - Introduce the Dataflow2 module first, then switch the analyses, then remove any unused code that still depends on the old Dataflow module, finally remove the old Dataflow module itself. (Personally I'd prefer the second option, but I'm also ok with the first one)
I’m happy to export the code to Phab if you prefer - I wasn’t sure what’s the recommended workflow for code that’s not ready for review…
Thanks, Michal
--- Politechnika Łódzka Lodz University of Technology Treść tej wiadomości zawiera informacje przeznaczone tylko dla adresata. Jeżeli nie jesteście Państwo jej adresatem, bądź otrzymaliście ją przez pomyłkę prosimy o powiadomienie o tym nadawcy oraz trwałe jej usunięcie. This email contains information intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient or if you have received this message in error, please notify the sender and delete it from your system.

On Mon, Oct 17, 2016 at 10:57 AM Jan Stolarek
Michał,
Dataflow module could indeed use cleanup. I have made two attempts at this in the past but I don't think any of them was merged - see [1] and [2]. [2] was mostly type-directed simplifications. It would be nice to have this included in one form or another. It sounds like you also have a more in-depth refactoring in mind. Personally as long as it is semantically correct I think it will be a good thing. I would especially support removing dead code that we don't really use.
[1] https://github.com/jstolarek/ghc/commits/js-hoopl-cleanup-v2 [2] https://github.com/jstolarek/ghc/commits/js-hoopl-cleanup-v2
Ok, I'll have a look at this! (did you intend to send two identical links?)
Second question: how could we merge this? (...) I'm not sure if I understand. The end result after merging will be exactly the same, right? Are you asking for advice what is the best way of doing this from a technical point if view? I would simply edit the existing module. Introducing a temporary second module seems like unnecessary extra work and perhaps complicating the patch review.
Yes, the end result would be the same - I'm merely asking what would be preferred by GHC devs (i.e., I don't know how fine grained patches to GHC usually are).
I’m happy to export the code to Phab if you prefer - I wasn’t sure what’s the recommended workflow for code that’s not ready for review… This is OK but please remember to set status of revision to "Planned changes" after uploading it to Phab so it doesn't sit in reviewing queue.
Cool, I didn't know about the "Planned changes" status. Thanks for mentioning it! Cheers, Michal

Michal Terepeta
On Mon, Oct 17, 2016 at 10:57 AM Jan Stolarek
wrote: Second question: how could we merge this? (...) I'm not sure if I understand. The end result after merging will be exactly the same, right? Are you asking for advice what is the best way of doing this from a technical point if view? I would simply edit the existing module. Introducing a temporary second module seems like unnecessary extra work and perhaps complicating the patch review.
Yes, the end result would be the same - I'm merely asking what would be preferred by GHC devs (i.e., I don't know how fine grained patches to GHC usually are).
It varies quite wildly. In general I would prefer fine-grained patches (but of course atomic) over coarse patches as they are easier to understand during review and after merge. Moreover, it's generally much easier to squash together patches that are too fine-grained than it is to split up a large patch, so I generally err on the side of finer rather than coarser during development. Cheers, - Ben

(did you intend to send two identical links?) No :-) The branches should be js-hoopl-cleanup-v1 and js-hoopl-cleanup-v2
Yes, the end result would be the same - I'm merely asking what would be preferred by GHC devs (i.e., I don't know how fine grained patches to GHC usually are). I don't think this should be visible in your final patch. In a perfect world each repositrory commit shouyld provide exactly one functionality - not more and not less. So your final patch should not reflect intermediate steps you took to implement some functionality because they are not really relevant. For the purpose of development and review it is fine to have a branch with lots of small commits but before merging you should just squash them into one.
Janek --- Politechnika Łódzka Lodz University of Technology Treść tej wiadomości zawiera informacje przeznaczone tylko dla adresata. Jeżeli nie jesteście Państwo jej adresatem, bądź otrzymaliście ją przez pomyłkę prosimy o powiadomienie o tym nadawcy oraz trwałe jej usunięcie. This email contains information intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient or if you have received this message in error, please notify the sender and delete it from your system.

Jan Stolarek
(did you intend to send two identical links?) No :-) The branches should be js-hoopl-cleanup-v1 and js-hoopl-cleanup-v2
Yes, the end result would be the same - I'm merely asking what would be preferred by GHC devs (i.e., I don't know how fine grained patches to GHC usually are). I don't think this should be visible in your final patch. In a perfect world each repositrory commit shouyld provide exactly one functionality - not more and not less. So your final patch should not reflect intermediate steps you took to implement some functionality because they are not really relevant. For the purpose of development and review it is fine to have a branch with lots of small commits but before merging you should just squash them into one.
I don't entirely agree. I personally find it very hard to review large patches as the amount of mental context generally seems to grow super-linearly in the amount of code touched. Moreover, I think it's important to remember that the need to read patches does not vanish the moment the patch is committed. To the contrary, review is merely the first of many instances in which a patch will be read. Other instances include, * when the patch is backported * when someone is trying to rebase one of their own changes on top of the patch * when another contributor is trying to follow the evolution of a piece of the compiler * when someone is later trying to understand a bug in the patch Consequently, I think it is fairly important not to throw out the structure that multiple, sensibly sized commits provides. Of course there is a compromise to be struck here: we don't want dozens of five-line patches; however I think that one mega-patch swings too far to the other extreme in the case of most features. Cheers, - Ben

I don't entirely agree. I personally find it very hard to review large patches as the amount of mental context generally seems to grow super-linearly in the amount of code touched. Moreover, I think it's important to remember that the need to read patches does not vanish the moment the patch is committed. To the contrary, review is merely the first of many instances in which a patch will be read. Other instances include, I wholeheartedly agree with everything you say. I don't see it as contradicting in any way principles that I outlined. It's just that sometimes doing a single logical change to the code requires a large patch and breaking it artificially into smaller patches might actually make matters worse. I believe this would be the case for this scenario. And honestly speaking I don't think that the patch here will be very big. But like you say, there's a compromise to be struck.
Janek
* when the patch is backported
* when someone is trying to rebase one of their own changes on top of the patch
* when another contributor is trying to follow the evolution of a piece of the compiler
* when someone is later trying to understand a bug in the patch
Consequently, I think it is fairly important not to throw out the structure that multiple, sensibly sized commits provides. Of course there is a compromise to be struck here: we don't want dozens of five-line patches; however I think that one mega-patch swings too far to the other extreme in the case of most features.
Cheers,
- Ben
--- Politechnika Łódzka Lodz University of Technology Treść tej wiadomości zawiera informacje przeznaczone tylko dla adresata. Jeżeli nie jesteście Państwo jej adresatem, bądź otrzymaliście ją przez pomyłkę prosimy o powiadomienie o tym nadawcy oraz trwałe jej usunięcie. This email contains information intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient or if you have received this message in error, please notify the sender and delete it from your system.

Jan Stolarek
I don't entirely agree. I personally find it very hard to review large patches as the amount of mental context generally seems to grow super-linearly in the amount of code touched. Moreover, I think it's important to remember that the need to read patches does not vanish the moment the patch is committed. To the contrary, review is merely the first of many instances in which a patch will be read. Other instances include,
I wholeheartedly agree with everything you say. I don't see it as contradicting in any way principles that I outlined. It's just that sometimes doing a single logical change to the code requires a large patch and breaking it artificially into smaller patches might actually make matters worse. I believe this would be the case for this scenario. And honestly speaking I don't think that the patch here will be very big. But like you say, there's a compromise to be struck.
Ahh, it looks like I was probably reading more into what you wrote than you intended; my apologies! Cheers, - Ben

On 16 October 2016 at 14:03, Michal Terepeta
Hi,
I was looking at cleaning up a bit the situation with dataflow analysis for Cmm. In particular, I was experimenting with rewriting the current `cmm.Hoopl.Dataflow` module: - To only include the functionality to do analysis (since GHC doesn’t seem to use the rewriting part). Benefits: - Code simplification (we could remove a lot of unused code). - Makes it clear what we’re actually using from Hoopl. - To have an interface that works with transfer functions operating on a whole basic block (`Block CmmNode C C`). This means that it would be up to the user of the algorithm to traverse the whole block.
Ah! This is actually something I wanted to do but didn't get around to. When I was working on the code generator I found that using Hoopl for rewriting was prohibitively slow, which is why we're not using it for anything right now, but I think that pulling out the basic block transformation is possibly a way forwards that would let us use Hoopl. A lot of the code you're removing is my attempt at "optimising" the Hoopl dataflow algorithm to make it usable in GHC. (I don't mind removing this, it was a failed experiment really)
Benefits: - Further simplifications. - We could remove `analyzeFwdBlocks` hack, which AFAICS is just a copy&paste of `analyzeFwd` but ignores the middle nodes (probably for efficiency of analyses that only look at the blocks).
Aren't we using this in dataflowAnalFwdBlocks, that's used by procpointAnalysis? Cheers Simon - More flexible (e.g., the clients could know which block they’re
processing; we could consider memoizing some per block information, etc.).
What do you think about this?
I have a branch that implements the above: https://github.com/michalt/ghc/tree/dataflow2/1 It’s introducing a second parallel implementation (`cmm.Hoopl.Dataflow2` module), so that it's possible to run ./validate while comparing the results of the old implementation with the new one.
Second question: how could we merge this? (assuming that people are generally ok with the approach) Some ideas: - Change cmm/Hoopl/Dataflow module itself along with the three analyses that use it in one step. - Introduce the Dataflow2 module first, then switch the analyses, then remove any unused code that still depends on the old Dataflow module, finally remove the old Dataflow module itself. (Personally I'd prefer the second option, but I'm also ok with the first one)
I’m happy to export the code to Phab if you prefer - I wasn’t sure what’s the recommended workflow for code that’s not ready for review…
Thanks, Michal
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

On Fri, Oct 21, 2016 at 4:04 PM Simon Marlow
On 16 October 2016 at 14:03, Michal Terepeta
wrote: Hi,
I was looking at cleaning up a bit the situation with dataflow analysis for Cmm. In particular, I was experimenting with rewriting the current `cmm.Hoopl.Dataflow` module: - To only include the functionality to do analysis (since GHC doesn’t seem to use the rewriting part). Benefits: - Code simplification (we could remove a lot of unused code). - Makes it clear what we’re actually using from Hoopl. - To have an interface that works with transfer functions operating on a whole basic block (`Block CmmNode C C`). This means that it would be up to the user of the algorithm to traverse the whole block.
Ah! This is actually something I wanted to do but didn't get around to. When I was working on the code generator I found that using Hoopl for rewriting was prohibitively slow, which is why we're not using it for anything right now, but I think that pulling out the basic block transformation is possibly a way forwards that would let us use Hoopl.
Right, I've also seen: https://plus.google.com/107890464054636586545/posts/dBbewpRfw6R https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/HooplPerformance but it seems that there weren't any follow-ups/conclusions on that. Also, I haven't started writing anything for the rewriting yet (only analysis for now). Btw. I'm currently experimenting with the GHC's fork of Dataflow module - and for now I'm not planning on pushing the changes to the upstream Hoopl. There are already projects that depend on the current interface of Hoopl (it's on Hackage after all) and it's going to be hard to make certain changes there. Hope that's ok with everyone! (also, we can always revisit this question later) A lot of the code you're removing is my attempt at "optimising" the Hoopl
dataflow algorithm to make it usable in GHC. (I don't mind removing this, it was a failed experiment really)
Thanks for saying that!
Benefits: - Further simplifications. - We could remove `analyzeFwdBlocks` hack, which AFAICS is just a copy&paste of `analyzeFwd` but ignores the middle nodes (probably for efficiency of analyses that only look at the blocks).
Aren't we using this in dataflowAnalFwdBlocks, that's used by procpointAnalysis?
Yes, sorry for confusion! What I meant is that analyzeFwdBlocks/dataflowAnalFwdBlocks is currently a special case of analyzeFwd/dataflowAnalFwd that only looks at first and last nodes. So if we move to block-oriented interface, it simply stops being a special case and fits the new interface (since it's the analysis that decides whether to look at the whole block or only parts of it). So it's removed in the sense of "removing a special case". Cheers, Michal
participants (4)
-
Ben Gamari
-
Jan Stolarek
-
Michal Terepeta
-
Simon Marlow