Removing Hoopl dependency?

Michal Terepeta

27 May 2017 27 May '17

5:58 p.m.

Hi all, I was looking at removing the `BlockId` type synonym in favor of Hoopl's `Label` (there was already a TODO and it is a bit confusing). But once I've started making the changes, I've realized that in a bunch of places this makes the code *less* readable. Mostly because of `CLabel` (sounds similar but is something quite different and having to rename local variables from `label` to `clabel` is not great). I started to look at alternatives and noticed that in general the interface between GHC and Hoopl is quite noisy and confusing: - Hoopl has `Label` which is GHC's `BlockId` but different than GHC's `CLabel` - Hoopl has `Unique` which is different than GHC's `Unique` - Hoopl has `Unique{Map,Set}` which are different than GHC's `Uniq{FM,Set}` - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is needed just to filter the exposed functions (filter out some of the Hoopl's and add the GHC ones). - Working in `cmm/` requires constant switching between GHC code and Hoopl (`CmmNode`/`CmmGraph`/`CmmBlock` and dataflow stuff is in GHC, the actual implementation of `Block`/`Graph` are defined in Hoopl, etc.) GHC is actually using only a small subset of Hoopl (e.g., the fixpoint computation is copied/specialized: `cmm/Hoopl/Dataflow`). So I was wondering - maybe it's worth to simply drop the dependency on Hoopl? (and copy the code that is actually necessary in GHC) I've done an experiment in [1] (to see how much we'd need to actually copy) and I really like the result: - We can remove one external dependency and git submodule at the cost of only 5 new modules in `cmm/Hoopl` (net gain of only 4 modules: we add 5 new but can remove `cmm/Hoopl`, which is no longer needed) - We should be able to fix all of the above issues and make the code easier to understand (less code, everything in one repo, fewer concepts). - It's going to be easier to change things since we don't need to worry about changing the public interface of Hoopl (it's a standalone package on Hackage and other people already depend on the current behavior). What do you think? Does anyone think we shouldn't do this? Thanks, Michal [1] Branch: https://github.com/michalt/ghc/tree/hoopl/no-hoopl Diff: https://github.com/ghc/ghc/compare/master...michalt:hoopl/no-hoopl For now I just copied the code/updated imports and didn't do any cleanups, but I'd be happy to do them in subsequent PRs

Attachments:

attachment.html (text/html — 3.3 KB)

Show replies by date

Herbert Valerio Riedel

27 May 27 May

6:28 p.m.

On 2017-05-27 at 19:58:11 +0200, Michal Terepeta wrote: [...]

...

I've done an experiment in [1] (to see how much we'd need to actually copy) and I really like the result: - We can remove one external dependency and git submodule at the cost of only 5 new modules in `cmm/Hoopl` (net gain of only 4 modules: we add 5 new but can remove `cmm/Hoopl`, which is no longer needed) - We should be able to fix all of the above issues and make the code easier to understand (less code, everything in one repo, fewer concepts). - It's going to be easier to change things since we don't need to worry about changing the public interface of Hoopl (it's a standalone package on Hackage and other people already depend on the current behavior).

What do you think? Does anyone think we shouldn't do this?

It appears to me that in this case, the benefits in gained flexibility outweight the cost of independent development and potential loss of synergies. So I'm +1 on this.

Ben Gamari

7:09 p.m.

Michal Terepeta writes:

...

Hi all,

...

What do you think? Does anyone think we shouldn't do this?

I think this seems quite reasonable. Given that hoopl will need changes to be truly useful to GHC, it seems quite reasonable to take the parts we need and iterate independently on the rest. Cheers, - Ben

Erik de Castro Lopo

10:51 p.m.

Michal Terepeta wrote:

...

What do you think? Does anyone think we shouldn't do this?

Makes sense. I'm +1 on this. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

Michal Terepeta

28 May 28 May

5:39 p.m.

Cool, thanks for quick replies! I've sent out https://phabricator.haskell.org/D3616 Cheers, Michal

Simon Peyton Jones

9:30 p.m.

Is there really a compelling case for forking Hoopl? I was talking to Kavon last week about doing exactly the opposite: using Hoopl more wholeheartedly! Before going ahead with this, let’s remember the downsides · If we fork Hoopl, improvements in one place will not be seen in the other. GHC originally used its own containers library but now uses ‘containers’, most of which is irrelevant to GHC, just to pick up the work that has been done to make ‘containers’ fast. Similarly, GHC has a clone of ‘pretty’, but someone is working (I think) to make GHC use ‘pretty’. · It’s not clear to me why GHC has a clone of parts of Hoopl. Would it not be better just to make Hoopl faster? If anything I ‘d like to use Hoopl more in Cmm optimisation passes in GHC, so we may want to use more of Hoopl’s facilities. The main reason you suggest for forking is that there are some awkward name clashes. Surely we could resolve these? e.g we could change CLabel in GHC; or agree with Hoopl maintainers that BlockId would be more helpful than Label. You mention that Hoopl uses Unique set/map. Why not use ‘containers’ for that? (Like GHC!) Let’s discuss this a bit more before executing I’m also interested to know: · who is actively working on Hoopl (Michael, Sophie, …)? · how are you using it (within GHC, or somewhere else)? It’d be good to review and update https://ghc.haskell.org/trac/ghc/wiki/Hoopl/Cleanup. Are there any other improvements planned? Simon From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Michal Terepeta Sent: 27 May 2017 18:58 To: ghc-devs Subject: Removing Hoopl dependency? Hi all, I was looking at removing the `BlockId` type synonym in favor of Hoopl's `Label` (there was already a TODO and it is a bit confusing). But once I've started making the changes, I've realized that in a bunch of places this makes the code *less* readable. Mostly because of `CLabel` (sounds similar but is something quite different and having to rename local variables from `label` to `clabel` is not great). I started to look at alternatives and noticed that in general the interface between GHC and Hoopl is quite noisy and confusing: - Hoopl has `Label` which is GHC's `BlockId` but different than GHC's `CLabel` - Hoopl has `Unique` which is different than GHC's `Unique` - Hoopl has `Unique{Map,Set}` which are different than GHC's `Uniq{FM,Set}` - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is needed just to filter the exposed functions (filter out some of the Hoopl's and add the GHC ones). - Working in `cmm/` requires constant switching between GHC code and Hoopl (`CmmNode`/`CmmGraph`/`CmmBlock` and dataflow stuff is in GHC, the actual implementation of `Block`/`Graph` are defined in Hoopl, etc.) GHC is actually using only a small subset of Hoopl (e.g., the fixpoint computation is copied/specialized: `cmm/Hoopl/Dataflow`). So I was wondering - maybe it's worth to simply drop the dependency on Hoopl? (and copy the code that is actually necessary in GHC) I've done an experiment in [1] (to see how much we'd need to actually copy) and I really like the result: - We can remove one external dependency and git submodule at the cost of only 5 new modules in `cmm/Hoopl` (net gain of only 4 modules: we add 5 new but can remove `cmm/Hoopl`, which is no longer needed) - We should be able to fix all of the above issues and make the code easier to understand (less code, everything in one repo, fewer concepts). - It's going to be easier to change things since we don't need to worry about changing the public interface of Hoopl (it's a standalone package on Hackage and other people already depend on the current behavior). What do you think? Does anyone think we shouldn't do this? Thanks, Michal [1] Branch: https://github.com/michalt/ghc/tree/hoopl/no-hoopl https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmichalt%2Fghc%2Ftree%2Fhoopl%2Fno-hoopl&data=02%7C01%7Csimonpj%40microsoft.com%7Cd1aa41921c50475c170c08d4a52a0f41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636315047453651609&sdata=%2F4kb1lqiCv2qTaQyQcuCTxBYTJ0jXmFikSLr%2Bpl8S14%3D&reserved=0 Diff: https://github.com/ghc/ghc/compare/master...michalt:hoopl/no-hoopl https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fghc%2Fghc%2Fcompare%2Fmaster...michalt%3Ahoopl%2Fno-hoopl&data=02%7C01%7Csimonpj%40microsoft.com%7Cd1aa41921c50475c170c08d4a52a0f41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636315047453651609&sdata=qLLAD20b4bewj9ItuwkOAPxMMJsXWSQO%2BadMPbSvsHM%3D&reserved=0 For now I just copied the code/updated imports and didn't do any cleanups, but I'd be happy to do them in subsequent PRs

Michal Terepeta

29 May 29 May

11:52 a.m.

On Sun, May 28, 2017 at 11:30 PM Simon Peyton Jones wrote:

...

Is there really a compelling case for forking Hoopl? I was talking to Kavon last week about doing exactly the opposite: using Hoopl more wholeheartedly!

Before going ahead with this, let’s remember the downsides

· If we fork Hoopl, improvements in one place will not be seen in the other. GHC originally used its own containers library but now uses ‘containers’, most of which is irrelevant to GHC, just to pick up the work that has been done to make ‘containers’ fast. Similarly, GHC has a clone of ‘pretty’, but someone is working (I think) to make GHC use ‘pretty’.

· It’s not clear to me why GHC has a clone of parts of Hoopl. Would it not be better just to make Hoopl faster?

If anything I ‘d like to use Hoopl more in Cmm optimisation passes in GHC, so we may want to use more of Hoopl’s facilities.

The main reason you suggest for forking is that there are some awkward name clashes. Surely we could resolve these? e.g we could change CLabel in GHC; or agree with Hoopl maintainers that BlockId would be more helpful than Label.

You mention that Hoopl uses Unique set/map. Why not use ‘containers’ for that? (Like GHC!)

Let’s discuss this a bit more before executing

I’m also interested to know:

· who is actively working on Hoopl (Michael, Sophie, …)?

· how are you using it (within GHC, or somewhere else)?

It’d be good to review and update https://ghc.haskell.org/trac/ghc/wiki/Hoopl/Cleanup. Are there any other improvements planned?

Simon

Hi Simon, Thanks for chiming in! Let me try to clarify the current situation and the motivation for my changes. 1) Initial fork of Hoopl Note that what I’m actually advocating is to *finish* forking Hoopl. The fork really started in ~2012 when the “new Cmm backend” was being finished. IIRC the main reason was the unacceptable performance and it seems that even Simon Marlow had trouble making it run fast enough: https://plus.google.com/107890464054636586545/posts/dBbewpRfw6R https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/HooplPerformance The end result is pretty sad: GHC has its own forked/specialized `Hoopl.Dataflow` module and is using Hoopl only for definitions of `Block`/`Graph` and maps/sets (if you look at my commit, it’s pretty clear what I’m copying). In particular it’s not using *any* of dataflow analysis or rewriting capabilities of the Hoopl package. 2) Reasons to finish forking The reasons I listed in my previous email already assumed the we have the forked `Hoopl.Dataflow` module in GHC. But if we want to discuss what are reasons for forking in general, then apart from the performance (as noted above), there’s the issue of Hoopl’s interface. IMHO the node-oriented approach taken by Hoopl is both not flexible enough and it makes it harder to optimize it. That’s why I’ve already changed GHC’s `Hoopl.Dataflow` module to operate “block-at-a-time” (https://github.com/ghc/ghc/commit/679ccd1c8860f1ef4b589c9593b74d04c97ae836) Some concrete examples: - For proc-point analysis it was necessary to introduce a hack to GHC’s `Dataflow` module to expose a separate analysis function that *ignores* the middle nodes (since for proc-points they’re irrelevant). My change to go “block-at-a-time” allowed us to remove that hack. - I’m trying to fix non-linearity of `CmmLayoutStack` in (https://phabricator.haskell.org/D3586) and again the block-oriented interface is useful - I want to do different rewrites based on which block is being considered (whether it’s a proc-point or not). This is not easily possible if I don’t know which block I’m in (which is the case for the node-oriented interface). I also don’t think that name clashes and the tension between Hoopl’s interface and GHC are easy to solve. Hoopl is a public, stand-alone package, so we can’t just change things without considering compatibility. For instance, we can’t use GHC’s `Unique` in Hoopl. But should we switch all of GHC to use Hoopl’s? Also having closely related concepts spread around GHC and Hoopl is not helping when trying to understand what’s happening. Finally, any changes to both GHC & Hoopl have much higher overhead than just changing GHC. In general, it really seems to me that Hoopl has been released simply too early, with not enough real-world usage and testing. When you say that we should “just fix Hoopl”, it sounds to me that we’d really need to rewrite it from scratch. And it’s much easier to do that if we can just experiment within GHC without worrying about breaking other existing Hoopl users. Only once we’re happy with the result, we should be considering separating it into a stand-alone package. 3) Difference between pretty/containers and Hoopl I also think that the situation with pretty/containers is quite different than Hoopl. They are much more general-purpose libraries, *far* more widely used and with more contributors. Take containers - the package is still very actively developed and constantly improved. Whereas Hoopl hasn’t really seen much activity in the last 5 years. So the benefit-cost ratio is much better - yes there is some cost in having containers as a dependency, but the benefits from the regular stream of improvements easily outweigh it. I don’t think that’s the case for Hoopl. Does this help understand my motivation? Let me know if anything is still unclear! Thanks, Michal

Simon Peyton Jones

7 Jun 7 Jun

5:05 p.m.

Michael Sorry to be slow. Note that what I’m actually advocating is to *finish* forking Hoopl. The fork really started in ~2012 when the “new Cmm backend” was being finished. Yes, I know. But what I’m suggesting is to revisit the reasons for that fork, and re-join if possible. Eg if Hoopl is too slow, can’t we make it faster? Why is GHC’s version faster? apart from the performance (as noted above), there’s the issue of Hoopl’s interface. IMHO the node-oriented approach taken by Hoopl is both not flexible enough and it makes it harder to optimize it. That’s why I’ve already changed GHC’s `Hoopl.Dataflow` module to operate “block-at-a-time” Well that sounds like an argument to re-engineer Hoopl’s API, rather an argument to fork it. If it’s a better API, can’t we make it better for everyone? I don’t yet understand what the “block-oriented” API is, or how it differs, but let’s have the conversation. When you say that we should “just fix Hoopl”, it sounds to me that we’d really need to rewrite it from scratch. And it’s much easier to do that if we can just experiment within GHC without worrying about breaking other existing Hoopl users Fine. But then let’s call it hoopl2, make it a separate package (perhaps with GHC as its only client for now), and declare that it’s intended to supersede hoopl. But do we even need to do that much? After all, a major version bump on a package is allowed to introduce breaking changes to the API. Anyone who wants the old API can use the old package. I wonder if you could start a wiki page somewhere (eg on the GHC wiki) listing all the changes you’d like to make in a “rewrite from scratch” story? That would help to “ground” the conversation. Thanks Simon From: Michal Terepeta [mailto:michal.terepeta@gmail.com] Sent: 29 May 2017 12:53 To: Simon Peyton Jones ; ghc-devs Subject: Re: Removing Hoopl dependency? On Sun, May 28, 2017 at 11:30 PM Simon Peyton Jones mailto:simonpj@microsoft.com> wrote: Is there really a compelling case for forking Hoopl? I was talking to Kavon last week about doing exactly the opposite: using Hoopl more wholeheartedly! Before going ahead with this, let’s remember the downsides • If we fork Hoopl, improvements in one place will not be seen in the other. GHC originally used its own containers library but now uses ‘containers’, most of which is irrelevant to GHC, just to pick up the work that has been done to make ‘containers’ fast. Similarly, GHC has a clone of ‘pretty’, but someone is working (I think) to make GHC use ‘pretty’. • It’s not clear to me why GHC has a clone of parts of Hoopl. Would it not be better just to make Hoopl faster? If anything I ‘d like to use Hoopl more in Cmm optimisation passes in GHC, so we may want to use more of Hoopl’s facilities. The main reason you suggest for forking is that there are some awkward name clashes. Surely we could resolve these? e.g we could change CLabel in GHC; or agree with Hoopl maintainers that BlockId would be more helpful than Label. You mention that Hoopl uses Unique set/map. Why not use ‘containers’ for that? (Like GHC!) Let’s discuss this a bit more before executing I’m also interested to know: • who is actively working on Hoopl (Michael, Sophie, …)? • how are you using it (within GHC, or somewhere else)? It’d be good to review and update https://ghc.haskell.org/trac/ghc/wiki/Hoopl/Cleanup. Are there any other improvements planned? Simon Hi Simon, Thanks for chiming in! Let me try to clarify the current situation and the motivation for my changes. 1) Initial fork of Hoopl Note that what I’m actually advocating is to *finish* forking Hoopl. The fork really started in ~2012 when the “new Cmm backend” was being finished. IIRC the main reason was the unacceptable performance and it seems that even Simon Marlow had trouble making it run fast enough: https://plus.google.com/107890464054636586545/posts/dBbewpRfw6R https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplus.google.com%2F107890464054636586545%2Fposts%2FdBbewpRfw6R&data=02%7C01%7Csimonpj%40microsoft.com%7C4fd225e63df14788371508d4a6893eac%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636316555796004376&sdata=DwwDF8h7lCAaQSzQuEJtaSGgUbvOHjrYEZoonp3BJPA%3D&reserved=0 https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/HooplPerformance The end result is pretty sad: GHC has its own forked/specialized `Hoopl.Dataflow` module and is using Hoopl only for definitions of `Block`/`Graph` and maps/sets (if you look at my commit, it’s pretty clear what I’m copying). In particular it’s not using *any* of dataflow analysis or rewriting capabilities of the Hoopl package. 2) Reasons to finish forking The reasons I listed in my previous email already assumed the we have the forked `Hoopl.Dataflow` module in GHC. But if we want to discuss what are reasons for forking in general, then apart from the performance (as noted above), there’s the issue of Hoopl’s interface. IMHO the node-oriented approach taken by Hoopl is both not flexible enough and it makes it harder to optimize it. That’s why I’ve already changed GHC’s `Hoopl.Dataflow` module to operate “block-at-a-time” (https://github.com/ghc/ghc/commit/679ccd1c8860f1ef4b589c9593b74d04c97ae836 https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fghc%2Fghc%2Fcommit%2F679ccd1c8860f1ef4b589c9593b74d04c97ae836&data=02%7C01%7Csimonpj%40microsoft.com%7C4fd225e63df14788371508d4a6893eac%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636316555796004376&sdata=85q7qNZ9gpUFxJKEBNo6sd4qnoRV9R6ob0XJWNHuhVM%3D&reserved=0) Some concrete examples: - For proc-point analysis it was necessary to introduce a hack to GHC’s `Dataflow` module to expose a separate analysis function that *ignores* the middle nodes (since for proc-points they’re irrelevant). My change to go “block-at-a-time” allowed us to remove that hack. - I’m trying to fix non-linearity of `CmmLayoutStack` in (https://phabricator.haskell.org/D3586) and again the block-oriented interface is useful - I want to do different rewrites based on which block is being considered (whether it’s a proc-point or not). This is not easily possible if I don’t know which block I’m in (which is the case for the node-oriented interface). I also don’t think that name clashes and the tension between Hoopl’s interface and GHC are easy to solve. Hoopl is a public, stand-alone package, so we can’t just change things without considering compatibility. For instance, we can’t use GHC’s `Unique` in Hoopl. But should we switch all of GHC to use Hoopl’s? Also having closely related concepts spread around GHC and Hoopl is not helping when trying to understand what’s happening. Finally, any changes to both GHC & Hoopl have much higher overhead than just changing GHC. In general, it really seems to me that Hoopl has been released simply too early, with not enough real-world usage and testing. When you say that we should “just fix Hoopl”, it sounds to me that we’d really need to rewrite it from scratch. And it’s much easier to do that if we can just experiment within GHC without worrying about breaking other existing Hoopl users. Only once we’re happy with the result, we should be considering separating it into a stand-alone package. 3) Difference between pretty/containers and Hoopl I also think that the situation with pretty/containers is quite different than Hoopl. They are much more general-purpose libraries, *far* more widely used and with more contributors. Take containers - the package is still very actively developed and constantly improved. Whereas Hoopl hasn’t really seen much activity in the last 5 years. So the benefit-cost ratio is much better - yes there is some cost in having containers as a dependency, but the benefits from the regular stream of improvements easily outweigh it. I don’t think that’s the case for Hoopl. Does this help understand my motivation? Let me know if anything is still unclear! Thanks, Michal

Michal Terepeta

8 Jun 8 Jun

6:58 p.m.

...

On Wed, Jun 7, 2017 at 7:05 PM Simon Peyton Jones wrote: Michael

Sorry to be slow.

...
Note that what I’m actually advocating is to *finish* forking Hoopl. The fork really started in ~2012 when the “new Cmm backend” was being finished.

Yes, I know. But what I’m suggesting is to revisit the reasons for that fork, and re-join if possible. Eg if Hoopl is too slow, can’t we make it faster? Why is GHC’s version faster?

...
apart from the performance (as noted above), there’s the issue of Hoopl’s interface. IMHO the node-oriented approach taken by Hoopl is both not flexible enough and it makes it harder to optimize it. That’s why I’ve already changed GHC’s `Hoopl.Dataflow` module to operate “block-at-a-time”

Well that sounds like an argument to re-engineer Hoopl’s API, rather an argument to fork it. If it’s a better API, can’t we make it better for everyone? I don’t yet understand what the “block-oriented” API is, or how it differs, but let’s have the conversation.

Sure, but re-engineering the API of a publicly use package has significant cost for everyone involved: - GHC: we might need to wait longer for any improvements and spend more time discussing various options (and compromises - what makes sense for GHC might not make sense for other people) - Hoopl users: will need to migrate to the new APIs potentially multiple times - Hoopl maintainers: might need to maintain more than one branches of Hoopl for a while And note that just bumping a version number might not be enough. IIRC Stackage only allows one version of each package and since Hoopl is a boot package for GHC, the new version will move to Stackage along with GHC. So any users of Hoopl that want to use the old package, will not be able to use that version of Stackage.

...

...
When you say that we should “just fix Hoopl”, it sounds to me that we’d really need to rewrite it from scratch. And it’s much easier to do that if we can just experiment within GHC without worrying about breaking other existing Hoopl users

Fine. But then let’s call it hoopl2, make it a separate package (perhaps with GHC as its only client for now), and declare that it’s intended to supersede hoopl.

Maybe this is the core of our disagreement - why is it a good idea to have Hoopl as a separate package in the first place? I've pointed multiple reasons why I think it has a significant cost. But I don't really see any major benefits. Looking at the commit history of Hoopl there hasn't been much development on it since 2012 when Simon M was trying to get the new GHC backend working (since then, it's mostly maintenance patches to keep up with changes in `base`, etc). Extracting a core part of any project to a shared library has some real costs, so there should be equally real benefits that outweigh that cost. (If I proposed extracting parts of Core optimizer to a separate package, wouldn't you expect some really good reasons for doing this?) I also do think this is quite different than a dependency on, say, `binary`, `containers` or `pretty`, where the API of the library is smaller (at least conceptually), much better understood and established. Cheers, Michal

Ben Gamari

7:23 p.m.

Michal Terepeta writes:

...

Maybe this is the core of our disagreement - why is it a good idea to have Hoopl as a separate package in the first place?

I've pointed multiple reasons why I think it has a significant cost. But I don't really see any major benefits. Looking at the commit history of Hoopl there hasn't been much development on it since 2012 when Simon M was trying to get the new GHC backend working (since then, it's mostly maintenance patches to keep up with changes in `base`, etc). Extracting a core part of any project to a shared library has some real costs, so there should be equally real benefits that outweigh that cost. (If I proposed extracting parts of Core optimizer to a separate package, wouldn't you expect some really good reasons for doing this?)

One way forward here would be to ask those who would be affected by a API rework whether they would be open to change. I don't believe there are too many hoopl users at the moment but I recall that previous efforts to change the library's interface were met with some resistance. However, even if we found that hoopl's current user-base is agreeable to change we would still need to account for the fact that advancing GHC in lockstep with an out-of-tree hoopl will take more effort than advancing it under Michal's merge proposal. Admittedly, with submodules this additional effort isn't too large, but it's still more than having hoopl and GHC under one tree. Cheers, - Ben

Simon Peyton Jones

9 Jun 9 Jun

7:50 a.m.

Maybe this is the core of our disagreement - why is it a good idea to have Hoopl as a separate package in the first place? One reason only: because it makes Hoopl usable by compilers other than GHC. And, dually, efforts by others to improve Hoopl will benefit GHC. If I proposed extracting parts of Core optimizer to a separate package, wouldn't you expect some really good reasons for doing this? A re-usable library should be a) a significant chunk of code, b) that can plausibly be re-purposed by others c) and that has an explicable API I think the Core optimiser is so big, and so GHC specific, that (b) and (c) are unlikely to hold. But we carefully designed Hoopl from the ground up so that it was agnostic about the node types, and so can be re-used for control flow graphs of many kinds. It’s designed to be re-usable. Whether it is actually re-used is another matter, of course. But if it’s part of GHC, it can’t be. Stackage only allows one version of each package I didn’t know that, but I can see it makes sense. That makes a strong case for re-doing it as a new package hoopl2, if the API needs to change substantially (something we have yet to discuss). I've pointed multiple reasons why I think it has a significant cost. Can you just summarise them again briefly for me? If we are free to choose nomenclature and API for hoopl2, I’m not yet seeing why making it a separate package is harder than not doing so. E.g. template-haskell is a separate package. Thanks! Simon From: Michal Terepeta [mailto:michal.terepeta@gmail.com] Sent: 08 June 2017 19:59 To: Simon Peyton Jones ; ghc-devs Cc: Kavon Farvardin Subject: Re: Removing Hoopl dependency?

...

On Wed, Jun 7, 2017 at 7:05 PM Simon Peyton Jones mailto:simonpj@microsoft.com> wrote: Michael

Sorry to be slow.

...
Note that what I’m actually advocating is to *finish* forking Hoopl. The fork really started in ~2012 when the “new Cmm backend” was being finished.

Yes, I know. But what I’m suggesting is to revisit the reasons for that fork, and re-join if possible. Eg if Hoopl is too slow, can’t we make it faster? Why is GHC’s version faster?

...
apart from the performance (as noted above), there’s the issue of Hoopl’s interface. IMHO the node-oriented approach taken by Hoopl is both not flexible enough and it makes it harder to optimize it. That’s why I’ve already changed GHC’s `Hoopl.Dataflow` module to operate “block-at-a-time”

Well that sounds like an argument to re-engineer Hoopl’s API, rather an argument to fork it. If it’s a better API, can’t we make it better for everyone? I don’t yet understand what the “block-oriented” API is, or how it differs, but let’s have the conversation.

...

...
When you say that we should “just fix Hoopl”, it sounds to me that we’d really need to rewrite it from scratch. And it’s much easier to do that if we can just experiment within GHC without worrying about breaking other existing Hoopl users

Fine. But then let’s call it hoopl2, make it a separate package (perhaps with GHC as its only client for now), and declare that it’s intended to supersede hoopl.

Merijn Verstraaten

8:31 a.m.

Lemme toss in my 2 cents as an outsider who likes to dabble in programming language and compilers: I would *love* to be able just drop in (parts) of GHC's optimisation into my toy compilers. Optimisation is complicated, lots of work, and not really the part I care about when toying with languages. I wasn't really aware of Hoopl before this thread, so now that I do I'm kinda sad by the idea of this reusable infrastructure being tossed out. I don't really have any vested interest/opinion on how to deal with the current Hoopl situation, so if it's decided to write a Hoopl2.0 instead, without backwards compatibility, I would still consider that a win. Cheers, Merijn

...

On 9 Jun 2017, at 9:50, Simon Peyton Jones via ghc-devs wrote:

Maybe this is the core of our disagreement - why is it a good idea to have Hoopl as a separate package in the first place?

One reason only: because it makes Hoopl usable by compilers other than GHC. And, dually, efforts by others to improve Hoopl will benefit GHC.

If I proposed extracting parts of Core optimizer to a separate package, wouldn't you expect some really good reasons for doing this?

A re-usable library should be a) a significant chunk of code, b) that can plausibly be re-purposed by others c) and that has an explicable API

I think the Core optimiser is so big, and so GHC specific, that (b) and (c) are unlikely to hold. But we carefully designed Hoopl from the ground up so that it was agnostic about the node types, and so can be re-used for control flow graphs of many kinds. It’s designed to be re-usable. Whether it is actually re-used is another matter, of course. But if it’s part of GHC, it can’t be.

Stackage only allows one version of each package

I didn’t know that, but I can see it makes sense. That makes a strong case for re-doing it as a new package hoopl2, if the API needs to change substantially (something we have yet to discuss).

I've pointed multiple reasons why I think it has a significant cost.

Can you just summarise them again briefly for me? If we are free to choose nomenclature and API for hoopl2, I’m not yet seeing why making it a separate package is harder than not doing so. E.g. template-haskell is a separate package.

Thanks!

Simon

From: Michal Terepeta [mailto:michal.terepeta@gmail.com] Sent: 08 June 2017 19:59 To: Simon Peyton Jones ; ghc-devs Cc: Kavon Farvardin Subject: Re: Removing Hoopl dependency?

...
On Wed, Jun 7, 2017 at 7:05 PM Simon Peyton Jones wrote:

...
Michael

...
...
Sorry to be slow.

...
...
...
Note that what I’m actually advocating is to *finish* forking Hoopl. The

...
...
fork really started in ~2012 when the “new Cmm backend” was being

...
...
finished.

...
...
Yes, I know. But what I’m suggesting is to revisit the reasons for that fork, and re-join if possible. Eg if Hoopl is too slow, can’t we make it faster? Why is GHC’s version faster?

...
...
...
apart from the performance

...
...
(as noted above), there’s the issue of Hoopl’s interface. IMHO the

...
...
node-oriented approach taken by Hoopl is both not flexible enough and it

...
...
makes it harder to optimize it. That’s why I’ve already changed GHC’s

...
...
`Hoopl.Dataflow` module to operate “block-at-a-time”

...
...
Well that sounds like an argument to re-engineer Hoopl’s API, rather an argument to fork it. If it’s a better API, can’t we make it better for everyone? I don’t yet understand what the “block-oriented” API is, or how it differs, but let’s have the conversation.

Sure, but re-engineering the API of a publicly use package has significant

cost for everyone involved:

- GHC: we might need to wait longer for any improvements and spend

more time discussing various options (and compromises - what makes

sense for GHC might not make sense for other people)

- Hoopl users: will need to migrate to the new APIs potentially

multiple times

- Hoopl maintainers: might need to maintain more than one branches of

Hoopl for a while

And note that just bumping a version number might not be enough. IIRC

Stackage only allows one version of each package and since Hoopl is a

boot package for GHC, the new version will move to Stackage along with

GHC. So any users of Hoopl that want to use the old package, will not

be able to use that version of Stackage.

...
...
When you say

...
...
that we should “just fix Hoopl”, it sounds to me that we’d really need

...
...
to rewrite it from scratch. And it’s much easier to do that if we can

...
...
just experiment within GHC without worrying about breaking other

...
...
existing Hoopl users

...
...
Fine. But then let’s call it hoopl2, make it a separate package (perhaps with GHC as its only client for now), and declare that it’s intended to supersede hoopl.

Maybe this is the core of our disagreement - why is it a good idea to

have Hoopl as a separate package in the first place?

I've pointed multiple reasons why I think it has a significant cost.

But I don't really see any major benefits. Looking at the commit

history of Hoopl there hasn't been much development on it since 2012

when Simon M was trying to get the new GHC backend working (since

then, it's mostly maintenance patches to keep up with changes in

`base`, etc).

Extracting a core part of any project to a shared library has some

real costs, so there should be equally real benefits that outweigh

that cost. (If I proposed extracting parts of Core optimizer to a

separate package, wouldn't you expect some really good reasons for

doing this?)

I also do think this is quite different than a dependency on, say,

`binary`, `containers` or `pretty`, where the API of the library is

smaller (at least conceptually), much better understood and

established.

Cheers,

Michal

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Herbert Valerio Riedel

9:16 a.m.

Hi Simon, On 2017-06-09 at 09:50:51 +0200, Simon Peyton Jones via ghc-devs wrote: [...]

...

...
Stackage only allows one version of each package

I didn’t know that, but I can see it makes sense. That makes a strong case for re-doing it as a new package hoopl2

The limitations of Stackage's design shouldn't drive nor limit library design. Cabal has been moving to finally allow us to have multiple versions and even multiple configurations/instances of the same version of a package registered in the package db at the same time, and subjecting ourselves to Stackage's limitations after all the work done (and more in that direction is being considered to push the boundaries even further) to that effect *now* seems quite backward to me. If we push the idea to its conclusion, that we shall rather publish a new package rather than release a new major version of a package to workaround Stackage, you'd see a proliferation of number-suffixed packages on Hackage. Moreover, packages which can easily support multiple major versions of a package would have to use conditional logic boilerplate in their .cabal files (which again would be incompatible with Stackage's inherent limitations, as it allows only *one configuration* of a given package version). We should build upon the facilities we already have in place; and major versions are here to encode the epoch/generation of an API; moreover, as a big advantage over classic SemVer, we also have this 2-component major version which gives us more flexibility for versioning during developing two or more epochs of an API in parallel. So hoopl-1.* and hoopl-2.* could keep evolving independently, each branch being able to perform major version increments in their respective version namespace. Cheers, HVR

Alan & Kim Zimmerman

9:26 a.m.

But equally, stackage is a major part of the haskell ecosystem. As such, implications and paths forward need to be considered. Alan On 9 June 2017 at 11:16, Herbert Valerio Riedel wrote:

...

Hi Simon,

On 2017-06-09 at 09:50:51 +0200, Simon Peyton Jones via ghc-devs wrote:

[...]

...
...
Stackage only allows one version of each package

I didn’t know that, but I can see it makes sense. That makes a strong case for re-doing it as a new package hoopl2

The limitations of Stackage's design shouldn't drive nor limit library design. Cabal has been moving to finally allow us to have multiple versions and even multiple configurations/instances of the same version of a package registered in the package db at the same time, and subjecting ourselves to Stackage's limitations after all the work done (and more in that direction is being considered to push the boundaries even further) to that effect *now* seems quite backward to me.

If we push the idea to its conclusion, that we shall rather publish a new package rather than release a new major version of a package to workaround Stackage, you'd see a proliferation of number-suffixed packages on Hackage. Moreover, packages which can easily support multiple major versions of a package would have to use conditional logic boilerplate in their .cabal files (which again would be incompatible with Stackage's inherent limitations, as it allows only *one configuration* of a given package version).

We should build upon the facilities we already have in place; and major versions are here to encode the epoch/generation of an API; moreover, as a big advantage over classic SemVer, we also have this 2-component major version which gives us more flexibility for versioning during developing two or more epochs of an API in parallel. So hoopl-1.* and hoopl-2.* could keep evolving independently, each branch being able to perform major version increments in their respective version namespace.

Cheers, HVR _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Michal Terepeta

12:26 p.m.

...

On Fri, Jun 9, 2017 at 9:50 AM Simon Peyton Jones wrote:

...
Maybe this is the core of our disagreement - why is it a good idea to have Hoopl as a separate package in the first place?

One reason only: because it makes Hoopl usable by compilers other than GHC. And, dually, efforts by others to improve Hoopl will benefit GHC.

...
If I proposed extracting parts of Core optimizer to a separate package, wouldn't you expect some really good reasons for doing this?

A re-usable library should be a) a significant chunk of code, b) that can plausibly be re-purposed by others c) and that has an explicable API

I think the Core optimiser is so big, and so GHC specific, that (b) and (c) are unlikely to hold. But we carefully designed Hoopl from the ground up so that it was agnostic about the node types, and so can be re-used for control flow graphs of many kinds. It’s designed to be re-usable. Whether it is actually re-used is another matter, of course. But if it’s part of GHC, it can’t be.

I agree with your characterization of a re-usable library and that Core optimizer would not be a good fit. But I do think that Hoopl also has some problems with b) and c) (although smaller): - Using an optimizer-as-a-library is not really common (I'm not aware of any compilers doing this, LLVM is to some degree close but it exposes the whole language as the interface so it's closer to the idea of extracting the whole Cmm backend). So I don't think the API for such a project is well understood. - The API is pretty wide and does put serious constraints on the IR (after all it defines blocks and graphs), making reusability potentially more tricky. So I think I understand your argument and we just disagree on whether this is worth the effort of having a separate package.

...

[...]

...
I've pointed multiple reasons why I think it has a significant cost.

Can you just summarise them again briefly for me? If we are free to

choose nomenclature and API for hoopl2, I’m not yet seeing why making it a separate package is harder than not doing so. E.g. template-haskell is a separate package. Having even Hoopl2 as a separate package would still entail additional work: - Hoopl2 would still need to duplicate some concepts (eg, `Unique`, etc. since it needs to be standalone) - Understanding code (esp. by newcommers) would be harder: the Cmm backend would be split between GHC and Hoopl2, with the latter necessarily being far more general/polymorphic than needed by GHC. - Getting the right performance in the presence of all this additional generality/polymorphism will likely require fair amount of additional work. - If Hoopl2 is used by other compilers, then we need to be more careful changing anything in incompatible ways, this will require more discussions & release coordination. Considering that Hoopl was never actually picked up by other compilers, I'm not convinced that this cost is justified. But I understand that other people might have a different opinion. So how about a compromise: - decouple GHC from the current Hoopl (ie, go ahead with my diff), - keep everything Hoopl related only in `compiler/cmm/Hoopl` with the long-term intention of creating a separate package, - experiment with and improve the code, - once (if?) we're happy with the results, discuss what/how to extract to a separate package. That gives us the freedom to try things out and see what works well (I simply don't have ready solutions for anything, being able to experiment is IMHO quite important). And once we reach the right performance/representation/abstraction/API we can work on extracting that. What do you think? Cheers, Michal

Sophie Taylor

11 Jun 11 Jun

1:09 p.m.

Hello, fellow workers! So, I'll pop in here with my thoughts. I'm writing an independent intermediate language library for functional languages, and I looked at using Hoopl. I would use it, but there are several reasons why I'm not currently doing so: 1) Combining facts from different domains through fancy lattice algorithms. This is fairly straightforward to add to Hoopl with minimal extra API change. 2) I wanted to write my data facts as a type-level list, `freer-effects` style, in order to be more explicit in my types about dependencies between analyses. This would require significantly altering the API. 3) Its own custom graph code. This is the biggest reason why I decided not to. Some problems: * It seems impossible to change the topology of the graph in a rewriting step. * I wanted to use term hypergraphs/hyperjungles due to some pretty nifty properties * The intermediate language I'm implementing, a derivative of Graph Reduction Intermediate Notation, aka GRIN from UHC, is, as its name implies, intrinsically graph-based. Thus, graph manipulation has to be pretty easy to do. So instead, I've decided to optimise another hypergraph library (`graph-rewriting` - I'm going to be rewriting it to use an inductive representation a la FGL) and implement a generic, Hoopl-esque analysis library on top of that. (Or more accurately, that is my plan for the next six months - I've been sidetracked getting parsing to work nice with an effect-based stack!) So, if Hoopl2 does become a thing, I'd be very keen on working on it, but if I were to actually use it myself, it'd probably require a complete rewrite. Fortunately, it's a pretty small library; and for GHC, its current usage is a pretty straightforward usecase which shouldn't be affected too much. That being said, if GHC were to better use Hoopl (e.g. moving some of the optimisations on Core to be Hoopl-based passes) then it would be a different story. So I guess I'm volunteering to do the rewrite for a potential Hoopl2 if it's wanted, as I'm about to do pretty much that anyway. Cheers, Sophie On Fri, 9 Jun 2017 at 22:31 Michal Terepeta wrote:

...

...
On Fri, Jun 9, 2017 at 9:50 AM Simon Peyton Jones wrote:

...
Maybe this is the core of our disagreement - why is it a good idea to have Hoopl as a separate package in the first place?

One reason only: because it makes Hoopl usable by compilers other than GHC. And, dually, efforts by others to improve Hoopl will benefit GHC.

...
If I proposed extracting parts of Core optimizer to a separate package, wouldn't you expect some really good reasons for doing this?

A re-usable library should be a) a significant chunk of code, b) that can plausibly be re-purposed by others c) and that has an explicable API

I think the Core optimiser is so big, and so GHC specific, that (b) and (c) are unlikely to hold. But we carefully designed Hoopl from the ground up so that it was agnostic about the node types, and so can be re-used for control flow graphs of many kinds. It’s designed to be re-usable. Whether it is actually re-used is another matter, of course. But if it’s part of GHC, it can’t be.

I agree with your characterization of a re-usable library and that Core optimizer would not be a good fit. But I do think that Hoopl also has some problems with b) and c) (although smaller): - Using an optimizer-as-a-library is not really common (I'm not aware of any compilers doing this, LLVM is to some degree close but it exposes the whole language as the interface so it's closer to the idea of extracting the whole Cmm backend). So I don't think the API for such a project is well understood. - The API is pretty wide and does put serious constraints on the IR (after all it defines blocks and graphs), making reusability potentially more tricky.

So I think I understand your argument and we just disagree on whether this is worth the effort of having a separate package.

...
[...]

...
I've pointed multiple reasons why I think it has a significant cost.

Can you just summarise them again briefly for me? If we are free to

choose nomenclature and API for hoopl2, I’m not yet seeing why making it a separate package is harder than not doing so. E.g. template-haskell is a separate package.

Having even Hoopl2 as a separate package would still entail additional work: - Hoopl2 would still need to duplicate some concepts (eg, `Unique`, etc. since it needs to be standalone) - Understanding code (esp. by newcommers) would be harder: the Cmm backend would be split between GHC and Hoopl2, with the latter necessarily being far more general/polymorphic than needed by GHC. - Getting the right performance in the presence of all this additional generality/polymorphism will likely require fair amount of additional work. - If Hoopl2 is used by other compilers, then we need to be more careful changing anything in incompatible ways, this will require more discussions & release coordination.

Considering that Hoopl was never actually picked up by other compilers, I'm not convinced that this cost is justified. But I understand that other people might have a different opinion. So how about a compromise: - decouple GHC from the current Hoopl (ie, go ahead with my diff), - keep everything Hoopl related only in `compiler/cmm/Hoopl` with the long-term intention of creating a separate package, - experiment with and improve the code, - once (if?) we're happy with the results, discuss what/how to extract to a separate package. That gives us the freedom to try things out and see what works well (I simply don't have ready solutions for anything, being able to experiment is IMHO quite important). And once we reach the right performance/representation/abstraction/API we can work on extracting that.

What do you think?

Cheers, Michal

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Simon Peyton Jones

12 Jun 12 Jun

7:06 a.m.

Interesting! Maybe there are a couple of different alternatives: A. A rewrite of Hoopl, with all the same basic ideas and data structures, but with a better API (I’m not sure exactly in what way, but Michael has some idea, as does Sophie), and a more efficient implementation. B. A more radical change to use hypergraphs, type-level lists etc. This sounds interesting, but it’s a more substantial change and before using it for GHC we’d need to discuss the new proposed API in some detail There’s no reason we couldn’t do (A) and (B) in parallel. Michael is suggesting doing (A) in GHC’s tree, but with a clearly-declared intent to bring it out as a separate library. (I’d advocate making it a separate library in GHC’s tree; we already have a number of those. That would leave Sophie free to do (B) free of the constraints of GHC depending on it; but we could always use it later. Does that sound plausible? Do we know of any other Hoopl users? Simon From: Sophie Taylor [mailto:sophie@traumapony.org] Sent: 11 June 2017 14:09 To: Michal Terepeta ; Simon Peyton Jones ; ghc-devs Cc: Kavon Farvardin Subject: Re: Removing Hoopl dependency? Hello, fellow workers! So, I'll pop in here with my thoughts. I'm writing an independent intermediate language library for functional languages, and I looked at using Hoopl. I would use it, but there are several reasons why I'm not currently doing so: 1) Combining facts from different domains through fancy lattice algorithms. This is fairly straightforward to add to Hoopl with minimal extra API change. 2) I wanted to write my data facts as a type-level list, `freer-effects` style, in order to be more explicit in my types about dependencies between analyses. This would require significantly altering the API. 3) Its own custom graph code. This is the biggest reason why I decided not to. Some problems: * It seems impossible to change the topology of the graph in a rewriting step. * I wanted to use term hypergraphs/hyperjungles due to some pretty nifty properties * The intermediate language I'm implementing, a derivative of Graph Reduction Intermediate Notation, aka GRIN from UHC, is, as its name implies, intrinsically graph-based. Thus, graph manipulation has to be pretty easy to do. So instead, I've decided to optimise another hypergraph library (`graph-rewriting` - I'm going to be rewriting it to use an inductive representation a la FGL) and implement a generic, Hoopl-esque analysis library on top of that. (Or more accurately, that is my plan for the next six months - I've been sidetracked getting parsing to work nice with an effect-based stack!) So, if Hoopl2 does become a thing, I'd be very keen on working on it, but if I were to actually use it myself, it'd probably require a complete rewrite. Fortunately, it's a pretty small library; and for GHC, its current usage is a pretty straightforward usecase which shouldn't be affected too much. That being said, if GHC were to better use Hoopl (e.g. moving some of the optimisations on Core to be Hoopl-based passes) then it would be a different story. So I guess I'm volunteering to do the rewrite for a potential Hoopl2 if it's wanted, as I'm about to do pretty much that anyway. Cheers, Sophie On Fri, 9 Jun 2017 at 22:31 Michal Terepeta mailto:michal.terepeta@gmail.com> wrote:

...

On Fri, Jun 9, 2017 at 9:50 AM Simon Peyton Jones mailto:simonpj@microsoft.com> wrote:

...
Maybe this is the core of our disagreement - why is it a good idea to have Hoopl as a separate package in the first place?

One reason only: because it makes Hoopl usable by compilers other than GHC. And, dually, efforts by others to improve Hoopl will benefit GHC.

...
If I proposed extracting parts of Core optimizer to a separate package, wouldn't you expect some really good reasons for doing this?

A re-usable library should be a) a significant chunk of code, b) that can plausibly be re-purposed by others c) and that has an explicable API

I think the Core optimiser is so big, and so GHC specific, that (b) and (c) are unlikely to hold. But we carefully designed Hoopl from the ground up so that it was agnostic about the node types, and so can be re-used for control flow graphs of many kinds. It’s designed to be re-usable. Whether it is actually re-used is another matter, of course. But if it’s part of GHC, it can’t be.

...

[...]

...
I've pointed multiple reasons why I think it has a significant cost.

Can you just summarise them again briefly for me? If we are free to choose nomenclature and API for hoopl2, I’m not yet seeing why making it a separate package is harder than not doing so. E.g. template-haskell is a separate package.

Having even Hoopl2 as a separate package would still entail additional work: - Hoopl2 would still need to duplicate some concepts (eg, `Unique`, etc. since it needs to be standalone) - Understanding code (esp. by newcommers) would be harder: the Cmm backend would be split between GHC and Hoopl2, with the latter necessarily being far more general/polymorphic than needed by GHC. - Getting the right performance in the presence of all this additional generality/polymorphism will likely require fair amount of additional work. - If Hoopl2 is used by other compilers, then we need to be more careful changing anything in incompatible ways, this will require more discussions & release coordination. Considering that Hoopl was never actually picked up by other compilers, I'm not convinced that this cost is justified. But I understand that other people might have a different opinion. So how about a compromise: - decouple GHC from the current Hoopl (ie, go ahead with my diff), - keep everything Hoopl related only in `compiler/cmm/Hoopl` with the long-term intention of creating a separate package, - experiment with and improve the code, - once (if?) we're happy with the results, discuss what/how to extract to a separate package. That gives us the freedom to try things out and see what works well (I simply don't have ready solutions for anything, being able to experiment is IMHO quite important). And once we reach the right performance/representation/abstraction/API we can work on extracting that. What do you think? Cheers, Michal _______________________________________________ ghc-devs mailing list ghc-devs@haskell.orgmailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=02%7C01%7Csimonpj%40microsoft.com%7Cd747eec3caa74856abe408d4b0cb1b80%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636327833778402907&sdata=XF%2FDDgrIvni6kMJQg0ubJXtVtfXUp1HLifUBz2RTxJ4%3D&reserved=0

Sophie Taylor

8:50 a.m.

I don't see why not, other than possible duplication of effort when it comes to some of the basic algorithms. Speaking of which, what policies are there on bringing in new dependencies to GHC, both compile-time and run-time (e.g. possible SMT solver support)? On Mon, 12 Jun 2017 at 17:07 Simon Peyton Jones wrote:

...

Interesting!

Maybe there are a couple of different alternatives:

A. A rewrite of Hoopl, with all the same basic ideas and data structures, but with a better API (I’m not sure exactly in what way, but Michael has some idea, as does Sophie), and a more efficient implementation.

B. A more radical change to use hypergraphs, type-level lists etc. This sounds interesting, but it’s a more substantial change and before using it for GHC we’d need to discuss the new proposed API in some detail

There’s no reason we couldn’t do (A) and (B) in parallel.

Michael is suggesting doing (A) in GHC’s tree, but with a clearly-declared intent to bring it out as a separate library. (I’d advocate *making* it a separate library in GHC’s tree; we already have a number of those.

That would leave Sophie free to do (B) free of the constraints of GHC depending on it; but we could always use it later.

Does that sound plausible? Do we know of any other Hoopl users?

Simon

*From:* Sophie Taylor [mailto:sophie@traumapony.org] *Sent:* 11 June 2017 14:09 *To:* Michal Terepeta ; Simon Peyton Jones < simonpj@microsoft.com>; ghc-devs

*Cc:* Kavon Farvardin *Subject:* Re: Removing Hoopl dependency?

Hello, fellow workers!

So, I'll pop in here with my thoughts.

I'm writing an independent intermediate language library for functional languages, and I looked at using Hoopl. I would use it, but there are several reasons why I'm not currently doing so:

1) Combining facts from different domains through fancy lattice algorithms. This is fairly straightforward to add to Hoopl with minimal extra API change.

2) I wanted to write my data facts as a type-level list, `freer-effects` style, in order to be more explicit in my types about dependencies between analyses. This would require significantly altering the API.

3) Its own custom graph code. This is the biggest reason why I decided not to. Some problems:

* It seems impossible to change the topology of the graph in a rewriting step.

* I wanted to use term hypergraphs/hyperjungles due to some pretty nifty properties

* The intermediate language I'm implementing, a derivative of Graph Reduction Intermediate Notation, aka GRIN from UHC, is, as its name implies, intrinsically graph-based. Thus, graph manipulation has to be pretty easy to do.

So instead, I've decided to optimise another hypergraph library (`graph-rewriting` - I'm going to be rewriting it to use an inductive representation a la FGL) and implement a generic, Hoopl-esque analysis library on top of that. (Or more accurately, that is my plan for the next six months - I've been sidetracked getting parsing to work nice with an effect-based stack!)

So, if Hoopl2 does become a thing, I'd be very keen on working on it, but if I were to actually use it myself, it'd probably require a complete rewrite. Fortunately, it's a pretty small library; and for GHC, its current usage is a pretty straightforward usecase which shouldn't be affected too much. That being said, if GHC were to better use Hoopl (e.g. moving some of the optimisations on Core to be Hoopl-based passes) then it would be a different story.

So I guess I'm volunteering to do the rewrite for a potential Hoopl2 if it's wanted, as I'm about to do pretty much that anyway.

Cheers,

Sophie

On Fri, 9 Jun 2017 at 22:31 Michal Terepeta wrote:

...
On Fri, Jun 9, 2017 at 9:50 AM Simon Peyton Jones wrote:

...
...
Maybe this is the core of our disagreement - why is it a good idea to have Hoopl as a separate package in the first place?

...
...
...
One reason only: because it makes Hoopl usable by compilers other than GHC. And, dually, efforts by others to improve Hoopl will benefit GHC.

...
...
...
If I proposed extracting parts of Core optimizer to a separate package, wouldn't you expect some really good reasons for doing this?

...
...
...
A re-usable library should be

...
a) a significant chunk of code,

...
b) that can plausibly be re-purposed by others

...
c) and that has an explicable API

...
...
I think the Core optimiser is so big, and so GHC specific, that (b) and (c) are unlikely to hold. But we carefully designed Hoopl from the ground up so that it was agnostic about the node types, and so can be re-used for control flow graphs of many kinds. It’s designed to be re-usable. Whether it is actually re-used is another matter, of course. But if it’s part of GHC, it can’t be.

I agree with your characterization of a re-usable library and that

Core optimizer would not be a good fit. But I do think that Hoopl also

has some problems with b) and c) (although smaller):

- Using an optimizer-as-a-library is not really common (I'm not aware

of any compilers doing this, LLVM is to some degree close but it

exposes the whole language as the interface so it's closer to the

idea of extracting the whole Cmm backend). So I don't think the API

for such a project is well understood.

- The API is pretty wide and does put serious constraints on the IR

(after all it defines blocks and graphs), making reusability

potentially more tricky.

So I think I understand your argument and we just disagree on whether

this is worth the effort of having a separate package.

...
...
[...]

...
...
...
I've pointed multiple reasons why I think it has a significant cost.

...
...
Can you just summarise them again briefly for me? If we are free to choose nomenclature and API for hoopl2, I’m not yet seeing why making it a separate package is harder than not doing so. E.g. template-haskell is a separate package.

Having even Hoopl2 as a separate package would still entail

additional work:

- Hoopl2 would still need to duplicate some concepts (eg, `Unique`,

etc. since it needs to be standalone)

- Understanding code (esp. by newcommers) would be harder: the Cmm

backend would be split between GHC and Hoopl2, with the latter

necessarily being far more general/polymorphic than needed by GHC.

- Getting the right performance in the presence of all this additional

generality/polymorphism will likely require fair amount of

additional work.

- If Hoopl2 is used by other compilers, then we need to be more

careful changing anything in incompatible ways, this will require

more discussions & release coordination.

Considering that Hoopl was never actually picked up by other

compilers, I'm not convinced that this cost is justified. But I

understand that other people might have a different opinion.

So how about a compromise:

- decouple GHC from the current Hoopl (ie, go ahead with my diff),

- keep everything Hoopl related only in `compiler/cmm/Hoopl` with the

long-term intention of creating a separate package,

- experiment with and improve the code,

- once (if?) we're happy with the results, discuss what/how to

extract to a separate package.

That gives us the freedom to try things out and see what works well

(I simply don't have ready solutions for anything, being able to

experiment is IMHO quite important). And once we reach the right

performance/representation/abstraction/API we can work on extracting

that.

What do you think?

Cheers,

Michal

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=02%7C01%7Csimonpj%40microsoft.com%7Cd747eec3caa74856abe408d4b0cb1b80%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636327833778402907&sdata=XF%2FDDgrIvni6kMJQg0ubJXtVtfXUp1HLifUBz2RTxJ4%3D&reserved=0

Ben Gamari

6:19 p.m.

Sophie Taylor writes:

...

I don't see why not, other than possible duplication of effort when it comes to some of the basic algorithms.

Speaking of which, what policies are there on bringing in new dependencies to GHC, both compile-time and run-time (e.g. possible SMT solver support)?

We are generally fairly conservative with adding new dependencies of either type. There are a variety of reasons for this: In the case of runtime dependencies the associated costs are fairly clear: it would either be a) harder for users to use GHC (in the case of mandatory dependencies) or, b) make it harder to follow the behavior of the compiler (in the case of optional dependencies discovered at runtime). There are also costs in the case of compile-time dependencies, although they may not be as easy to see. First, in order to maintain a reproducible revision history GHC includes all dependent libraries as submodules and ships them with source distributions. These submodules carry a small but non-negligible cost to developers due to idiosyncracies in how they are handled by both git and Phabricator. Moreover, we need to periodically bump these submodules, which inevitably brings integration issues which require coordination with upstream to fix. Also, there is a significant synchronization overhead associated with getting upstream maintainers to release new library versions prior to a GHC release. While this generally only affects the release manager, for that person it is indeed a significant cost and does tend to slow down the release cycle. Finally, dependencies of the `ghc` library affects users of tooling which links to it (e.g. ghc-mod). Specifically, since we can only link against a single version of a given package at a time, such tooling packages are forced to link against whatever version `ghc` depends upon. This means that users won't get bugfixes and can constrain install plans, sometimes to the point where no plan is possible. Cheers, - Ben

Simon Peyton Jones

9:48 p.m.

Speaking of which, what policies are there on bringing in new dependencies to GHC, both compile-time and run-time (e.g. possible SMT solver support)? We don’t have a formal policy, but we are generally reluctant to take on new dependencies. For SMT solvers, Iavor is using one via a typechecker plugin. Simon From: Sophie Taylor [mailto:sophie@traumapony.org] Sent: 12 June 2017 09:50 To: Simon Peyton Jones ; Michal Terepeta ; ghc-devs Cc: Kavon Farvardin Subject: Re: Removing Hoopl dependency? I don't see why not, other than possible duplication of effort when it comes to some of the basic algorithms. Speaking of which, what policies are there on bringing in new dependencies to GHC, both compile-time and run-time (e.g. possible SMT solver support)? On Mon, 12 Jun 2017 at 17:07 Simon Peyton Jones mailto:simonpj@microsoft.com> wrote: Interesting! Maybe there are a couple of different alternatives: A. A rewrite of Hoopl, with all the same basic ideas and data structures, but with a better API (I’m not sure exactly in what way, but Michael has some idea, as does Sophie), and a more efficient implementation. B. A more radical change to use hypergraphs, type-level lists etc. This sounds interesting, but it’s a more substantial change and before using it for GHC we’d need to discuss the new proposed API in some detail There’s no reason we couldn’t do (A) and (B) in parallel. Michael is suggesting doing (A) in GHC’s tree, but with a clearly-declared intent to bring it out as a separate library. (I’d advocate making it a separate library in GHC’s tree; we already have a number of those. That would leave Sophie free to do (B) free of the constraints of GHC depending on it; but we could always use it later. Does that sound plausible? Do we know of any other Hoopl users? Simon From: Sophie Taylor [mailto:sophie@traumapony.orgmailto:sophie@traumapony.org] Sent: 11 June 2017 14:09 To: Michal Terepeta mailto:michal.terepeta@gmail.com>; Simon Peyton Jones mailto:simonpj@microsoft.com>; ghc-devs mailto:ghc-devs@haskell.org> Cc: Kavon Farvardin mailto:kavon@cs.uchicago.edu> Subject: Re: Removing Hoopl dependency? Hello, fellow workers! So, I'll pop in here with my thoughts. I'm writing an independent intermediate language library for functional languages, and I looked at using Hoopl. I would use it, but there are several reasons why I'm not currently doing so: 1) Combining facts from different domains through fancy lattice algorithms. This is fairly straightforward to add to Hoopl with minimal extra API change. 2) I wanted to write my data facts as a type-level list, `freer-effects` style, in order to be more explicit in my types about dependencies between analyses. This would require significantly altering the API. 3) Its own custom graph code. This is the biggest reason why I decided not to. Some problems: * It seems impossible to change the topology of the graph in a rewriting step. * I wanted to use term hypergraphs/hyperjungles due to some pretty nifty properties * The intermediate language I'm implementing, a derivative of Graph Reduction Intermediate Notation, aka GRIN from UHC, is, as its name implies, intrinsically graph-based. Thus, graph manipulation has to be pretty easy to do. So instead, I've decided to optimise another hypergraph library (`graph-rewriting` - I'm going to be rewriting it to use an inductive representation a la FGL) and implement a generic, Hoopl-esque analysis library on top of that. (Or more accurately, that is my plan for the next six months - I've been sidetracked getting parsing to work nice with an effect-based stack!) So, if Hoopl2 does become a thing, I'd be very keen on working on it, but if I were to actually use it myself, it'd probably require a complete rewrite. Fortunately, it's a pretty small library; and for GHC, its current usage is a pretty straightforward usecase which shouldn't be affected too much. That being said, if GHC were to better use Hoopl (e.g. moving some of the optimisations on Core to be Hoopl-based passes) then it would be a different story. So I guess I'm volunteering to do the rewrite for a potential Hoopl2 if it's wanted, as I'm about to do pretty much that anyway. Cheers, Sophie On Fri, 9 Jun 2017 at 22:31 Michal Terepeta mailto:michal.terepeta@gmail.com> wrote:

...

On Fri, Jun 9, 2017 at 9:50 AM Simon Peyton Jones mailto:simonpj@microsoft.com> wrote:

...
Maybe this is the core of our disagreement - why is it a good idea to have Hoopl as a separate package in the first place?

One reason only: because it makes Hoopl usable by compilers other than GHC. And, dually, efforts by others to improve Hoopl will benefit GHC.

...
If I proposed extracting parts of Core optimizer to a separate package, wouldn't you expect some really good reasons for doing this?

A re-usable library should be a) a significant chunk of code, b) that can plausibly be re-purposed by others c) and that has an explicable API

I think the Core optimiser is so big, and so GHC specific, that (b) and (c) are unlikely to hold. But we carefully designed Hoopl from the ground up so that it was agnostic about the node types, and so can be re-used for control flow graphs of many kinds. It’s designed to be re-usable. Whether it is actually re-used is another matter, of course. But if it’s part of GHC, it can’t be.

...

[...]

...
I've pointed multiple reasons why I think it has a significant cost.

Can you just summarise them again briefly for me? If we are free to choose nomenclature and API for hoopl2, I’m not yet seeing why making it a separate package is harder than not doing so. E.g. template-haskell is a separate package.

Sophie Taylor

11:58 p.m.

Ben, Simon, Thanks, that's good to know! On Tue, 13 Jun 2017 at 07:48 Simon Peyton Jones wrote:

...

Speaking of which, what policies are there on bringing in new dependencies to GHC, both compile-time and run-time (e.g. possible SMT solver support)?

We don’t have a formal policy, but we are generally reluctant to take on new dependencies. For SMT solvers, Iavor is using one via a typechecker plugin.

Simon

*From:* Sophie Taylor [mailto:sophie@traumapony.org] *Sent:* 12 June 2017 09:50 *To:* Simon Peyton Jones ; Michal Terepeta < michal.terepeta@gmail.com>; ghc-devs

*Cc:* Kavon Farvardin *Subject:* Re: Removing Hoopl dependency?

I don't see why not, other than possible duplication of effort when it comes to some of the basic algorithms.

Speaking of which, what policies are there on bringing in new dependencies to GHC, both compile-time and run-time (e.g. possible SMT solver support)?

On Mon, 12 Jun 2017 at 17:07 Simon Peyton Jones wrote:

Interesting!

Maybe there are a couple of different alternatives:

A. A rewrite of Hoopl, with all the same basic ideas and data structures, but with a better API (I’m not sure exactly in what way, but Michael has some idea, as does Sophie), and a more efficient implementation.

B. A more radical change to use hypergraphs, type-level lists etc. This sounds interesting, but it’s a more substantial change and before using it for GHC we’d need to discuss the new proposed API in some detail

There’s no reason we couldn’t do (A) and (B) in parallel.

Michael is suggesting doing (A) in GHC’s tree, but with a clearly-declared intent to bring it out as a separate library. (I’d advocate *making* it a separate library in GHC’s tree; we already have a number of those.

That would leave Sophie free to do (B) free of the constraints of GHC depending on it; but we could always use it later.

Does that sound plausible? Do we know of any other Hoopl users?

Simon

*From:* Sophie Taylor [mailto:sophie@traumapony.org] *Sent:* 11 June 2017 14:09 *To:* Michal Terepeta ; Simon Peyton Jones < simonpj@microsoft.com>; ghc-devs

*Cc:* Kavon Farvardin *Subject:* Re: Removing Hoopl dependency?

Hello, fellow workers!

So, I'll pop in here with my thoughts.

I'm writing an independent intermediate language library for functional languages, and I looked at using Hoopl. I would use it, but there are several reasons why I'm not currently doing so:

1) Combining facts from different domains through fancy lattice algorithms. This is fairly straightforward to add to Hoopl with minimal extra API change.

2) I wanted to write my data facts as a type-level list, `freer-effects` style, in order to be more explicit in my types about dependencies between analyses. This would require significantly altering the API.

3) Its own custom graph code. This is the biggest reason why I decided not to. Some problems:

* It seems impossible to change the topology of the graph in a rewriting step.

* I wanted to use term hypergraphs/hyperjungles due to some pretty nifty properties

* The intermediate language I'm implementing, a derivative of Graph Reduction Intermediate Notation, aka GRIN from UHC, is, as its name implies, intrinsically graph-based. Thus, graph manipulation has to be pretty easy to do.

So instead, I've decided to optimise another hypergraph library (`graph-rewriting` - I'm going to be rewriting it to use an inductive representation a la FGL) and implement a generic, Hoopl-esque analysis library on top of that. (Or more accurately, that is my plan for the next six months - I've been sidetracked getting parsing to work nice with an effect-based stack!)

So, if Hoopl2 does become a thing, I'd be very keen on working on it, but if I were to actually use it myself, it'd probably require a complete rewrite. Fortunately, it's a pretty small library; and for GHC, its current usage is a pretty straightforward usecase which shouldn't be affected too much. That being said, if GHC were to better use Hoopl (e.g. moving some of the optimisations on Core to be Hoopl-based passes) then it would be a different story.

So I guess I'm volunteering to do the rewrite for a potential Hoopl2 if it's wanted, as I'm about to do pretty much that anyway.

Cheers,

Sophie

On Fri, 9 Jun 2017 at 22:31 Michal Terepeta wrote:

...
On Fri, Jun 9, 2017 at 9:50 AM Simon Peyton Jones wrote:

...
...
Maybe this is the core of our disagreement - why is it a good idea to have Hoopl as a separate package in the first place?

...
...
...
One reason only: because it makes Hoopl usable by compilers other than GHC. And, dually, efforts by others to improve Hoopl will benefit GHC.

...
...
...
If I proposed extracting parts of Core optimizer to a separate package, wouldn't you expect some really good reasons for doing this?

...
...
...
A re-usable library should be

...
a) a significant chunk of code,

...
b) that can plausibly be re-purposed by others

...
c) and that has an explicable API

...
...
I think the Core optimiser is so big, and so GHC specific, that (b) and (c) are unlikely to hold. But we carefully designed Hoopl from the ground up so that it was agnostic about the node types, and so can be re-used for control flow graphs of many kinds. It’s designed to be re-usable. Whether it is actually re-used is another matter, of course. But if it’s part of GHC, it can’t be.

I agree with your characterization of a re-usable library and that

Core optimizer would not be a good fit. But I do think that Hoopl also

has some problems with b) and c) (although smaller):

- Using an optimizer-as-a-library is not really common (I'm not aware

of any compilers doing this, LLVM is to some degree close but it

exposes the whole language as the interface so it's closer to the

idea of extracting the whole Cmm backend). So I don't think the API

for such a project is well understood.

- The API is pretty wide and does put serious constraints on the IR

(after all it defines blocks and graphs), making reusability

potentially more tricky.

So I think I understand your argument and we just disagree on whether

this is worth the effort of having a separate package.

...
...
[...]

...
...
...
I've pointed multiple reasons why I think it has a significant cost.

...
...
Can you just summarise them again briefly for me? If we are free to choose nomenclature and API for hoopl2, I’m not yet seeing why making it a separate package is harder than not doing so. E.g. template-haskell is a separate package.

Having even Hoopl2 as a separate package would still entail

additional work:

- Hoopl2 would still need to duplicate some concepts (eg, `Unique`,

etc. since it needs to be standalone)

- Understanding code (esp. by newcommers) would be harder: the Cmm

backend would be split between GHC and Hoopl2, with the latter

necessarily being far more general/polymorphic than needed by GHC.

- Getting the right performance in the presence of all this additional

generality/polymorphism will likely require fair amount of

additional work.

- If Hoopl2 is used by other compilers, then we need to be more

careful changing anything in incompatible ways, this will require

more discussions & release coordination.

Considering that Hoopl was never actually picked up by other

compilers, I'm not convinced that this cost is justified. But I

understand that other people might have a different opinion.

So how about a compromise:

- decouple GHC from the current Hoopl (ie, go ahead with my diff),

- keep everything Hoopl related only in `compiler/cmm/Hoopl` with the

long-term intention of creating a separate package,

- experiment with and improve the code,

- once (if?) we're happy with the results, discuss what/how to

extract to a separate package.

That gives us the freedom to try things out and see what works well

(I simply don't have ready solutions for anything, being able to

experiment is IMHO quite important). And once we reach the right

performance/representation/abstraction/API we can work on extracting

that.

What do you think?

Cheers,

Michal

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=02%7C01%7Csimonpj%40microsoft.com%7Cd747eec3caa74856abe408d4b0cb1b80%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636327833778402907&sdata=XF%2FDDgrIvni6kMJQg0ubJXtVtfXUp1HLifUBz2RTxJ4%3D&reserved=0

Ben Gamari

6:05 p.m.

Simon Peyton Jones via ghc-devs writes: Snip

...

That would leave Sophie free to do (B) free of the constraints of GHC depending on it; but we could always use it later.

Does that sound plausible? Do we know of any other Hoopl users?

CCing Ning, who is currently maintaining hoopl and I believe has some projects using it. Ning, you may want to have a look through this thread if you haven't already seen it. You can find the previous messages in the list archive [1]. Cheers, - Ben [1] May messages: https://mail.haskell.org/pipermail/ghc-devs/2017-May/014255.html June messages: https://mail.haskell.org/pipermail/ghc-devs/2017-June/014293.html

Michal Terepeta

6:12 p.m.

...

On Mon, Jun 12, 2017 at 8:05 PM Ben Gamari wrote: Simon Peyton Jones via ghc-devs writes:

Snip

...
That would leave Sophie free to do (B) free of the constraints of GHC depending on it; but we could always use it later.

Does that sound plausible? Do we know of any other Hoopl users?

CCing Ning, who is currently maintaining hoopl and I believe has some projects using it.

Ning, you may want to have a look through this thread if you haven't already seen it. You can find the previous messages in the list archive [1].

Cheers,

- Ben

Based on [1] there are four public packages: - ethereum-analyzer, - linearscan-hoopl, - llvm-analysis, - text-show-instances But there might be more that are not open-source/uploaded to hackage/stackage. Cheers, Michal [1] https://www.stackage.org/lts-8.18/package/hoopl-3.10.2.1

3274

Age (days ago)

3290

Last active (days ago)

List overview

Download

22 comments

8 participants

participants (8)

Alan & Kim Zimmerman
Ben Gamari
Erik de Castro Lopo
Herbert Valerio Riedel
Merijn Verstraaten
Michal Terepeta
Simon Peyton Jones
Sophie Taylor