
On Sun, May 28, 2017 at 11:30 PM Simon Peyton Jones
Is there really a compelling case for forking Hoopl? I was talking to Kavon last week about doing exactly the opposite: using Hoopl more wholeheartedly!
Before going ahead with this, let’s remember the downsides
· If we fork Hoopl, improvements in one place will not be seen in the other. GHC originally used its own containers library but now uses ‘containers’, most of which is irrelevant to GHC, just to pick up the work that has been done to make ‘containers’ fast. Similarly, GHC has a clone of ‘pretty’, but someone is working (I think) to make GHC use ‘pretty’.
· It’s not clear to me why GHC has a clone of parts of Hoopl. Would it not be better just to make Hoopl faster?
If anything I ‘d like to use Hoopl more in Cmm optimisation passes in GHC, so we may want to use more of Hoopl’s facilities.
The main reason you suggest for forking is that there are some awkward name clashes. Surely we could resolve these? e.g we could change CLabel in GHC; or agree with Hoopl maintainers that BlockId would be more helpful than Label.
You mention that Hoopl uses Unique set/map. Why not use ‘containers’ for that? (Like GHC!)
Let’s discuss this a bit more before executing
I’m also interested to know:
· who is actively working on Hoopl (Michael, Sophie, …)?
· how are you using it (within GHC, or somewhere else)?
It’d be good to review and update https://ghc.haskell.org/trac/ghc/wiki/Hoopl/Cleanup. Are there any other improvements planned?
Simon
Hi Simon, Thanks for chiming in! Let me try to clarify the current situation and the motivation for my changes. 1) Initial fork of Hoopl Note that what I’m actually advocating is to *finish* forking Hoopl. The fork really started in ~2012 when the “new Cmm backend” was being finished. IIRC the main reason was the unacceptable performance and it seems that even Simon Marlow had trouble making it run fast enough: https://plus.google.com/107890464054636586545/posts/dBbewpRfw6R https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/HooplPerformance The end result is pretty sad: GHC has its own forked/specialized `Hoopl.Dataflow` module and is using Hoopl only for definitions of `Block`/`Graph` and maps/sets (if you look at my commit, it’s pretty clear what I’m copying). In particular it’s not using *any* of dataflow analysis or rewriting capabilities of the Hoopl package. 2) Reasons to finish forking The reasons I listed in my previous email already assumed the we have the forked `Hoopl.Dataflow` module in GHC. But if we want to discuss what are reasons for forking in general, then apart from the performance (as noted above), there’s the issue of Hoopl’s interface. IMHO the node-oriented approach taken by Hoopl is both not flexible enough and it makes it harder to optimize it. That’s why I’ve already changed GHC’s `Hoopl.Dataflow` module to operate “block-at-a-time” (https://github.com/ghc/ghc/commit/679ccd1c8860f1ef4b589c9593b74d04c97ae836) Some concrete examples: - For proc-point analysis it was necessary to introduce a hack to GHC’s `Dataflow` module to expose a separate analysis function that *ignores* the middle nodes (since for proc-points they’re irrelevant). My change to go “block-at-a-time” allowed us to remove that hack. - I’m trying to fix non-linearity of `CmmLayoutStack` in (https://phabricator.haskell.org/D3586) and again the block-oriented interface is useful - I want to do different rewrites based on which block is being considered (whether it’s a proc-point or not). This is not easily possible if I don’t know which block I’m in (which is the case for the node-oriented interface). I also don’t think that name clashes and the tension between Hoopl’s interface and GHC are easy to solve. Hoopl is a public, stand-alone package, so we can’t just change things without considering compatibility. For instance, we can’t use GHC’s `Unique` in Hoopl. But should we switch all of GHC to use Hoopl’s? Also having closely related concepts spread around GHC and Hoopl is not helping when trying to understand what’s happening. Finally, any changes to both GHC & Hoopl have much higher overhead than just changing GHC. In general, it really seems to me that Hoopl has been released simply too early, with not enough real-world usage and testing. When you say that we should “just fix Hoopl”, it sounds to me that we’d really need to rewrite it from scratch. And it’s much easier to do that if we can just experiment within GHC without worrying about breaking other existing Hoopl users. Only once we’re happy with the result, we should be considering separating it into a stand-alone package. 3) Difference between pretty/containers and Hoopl I also think that the situation with pretty/containers is quite different than Hoopl. They are much more general-purpose libraries, *far* more widely used and with more contributors. Take containers - the package is still very actively developed and constantly improved. Whereas Hoopl hasn’t really seen much activity in the last 5 years. So the benefit-cost ratio is much better - yes there is some cost in having containers as a dependency, but the benefits from the regular stream of improvements easily outweigh it. I don’t think that’s the case for Hoopl. Does this help understand my motivation? Let me know if anything is still unclear! Thanks, Michal