Re: Removing Hoopl dependency?

29 May 2017

      On Sun, May 28, 2017 at 11:30 PM Simon Peyton Jones 
wrote:
...
Is there really a compelling case for forking Hoopl?  I was talking to
Kavon last week about doing exactly the opposite: using Hoopl more
wholeheartedly!
Before going ahead with this, let’s remember the downsides
·        If we fork Hoopl, improvements in one place will not be seen in
the other.  GHC originally used its own containers library but now uses
‘containers’, most of which is irrelevant to GHC, just to pick up the work
that has been done to make ‘containers’ fast.  Similarly, GHC has a clone
of ‘pretty’, but someone is working (I think) to make GHC use ‘pretty’.
·        It’s not clear to me why GHC has a clone of parts of Hoopl.
Would it not be better just to make Hoopl faster?
If anything I ‘d like to use Hoopl more in Cmm optimisation passes in GHC,
so we may want to use more of Hoopl’s facilities.
The main reason you suggest for forking is that there are some awkward
name clashes.  Surely we could resolve these? e.g we could change CLabel in
GHC; or agree with Hoopl maintainers that BlockId would be more helpful
than Label.
You mention that Hoopl uses Unique set/map.  Why not use ‘containers’ for
that?  (Like GHC!)
Let’s discuss this a bit more before executing
I’m also interested to know:
·        who is actively working on Hoopl (Michael, Sophie, …)?
·        how are you using it (within GHC, or somewhere else)?
It’d be good to review and update
https://ghc.haskell.org/trac/ghc/wiki/Hoopl/Cleanup.  Are there any other
improvements planned?
Simon
Hi Simon,

Thanks for chiming in! Let me try to clarify the current situation and
the motivation for my changes.

1) Initial fork of Hoopl

Note that what I’m actually advocating is to *finish* forking Hoopl. The
fork really started in ~2012 when the “new Cmm backend” was being
finished.
IIRC the main reason was the unacceptable performance and it seems that
even Simon Marlow had trouble making it run fast enough:
https://plus.google.com/107890464054636586545/posts/dBbewpRfw6R
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/HooplPerformance
The end result is pretty sad: GHC has its own forked/specialized
`Hoopl.Dataflow` module and is using Hoopl only for definitions of
`Block`/`Graph` and maps/sets (if you look at my commit, it’s pretty
clear what I’m copying). In particular it’s not using *any* of dataflow
analysis or rewriting capabilities of the Hoopl package.

2) Reasons to finish forking

The reasons I listed in my previous email already assumed the we have
the forked `Hoopl.Dataflow` module in GHC. But if we want to discuss
what are reasons for forking in general, then apart from the performance
(as noted above), there’s the issue of Hoopl’s interface. IMHO the
node-oriented approach taken by Hoopl is both not flexible enough and it
makes it harder to optimize it. That’s why I’ve already changed GHC’s
`Hoopl.Dataflow` module to operate “block-at-a-time”
(https://github.com/ghc/ghc/commit/679ccd1c8860f1ef4b589c9593b74d04c97ae836)
Some concrete examples:
- For proc-point analysis it was necessary to introduce a hack to GHC’s
  `Dataflow` module to expose a separate analysis function that
  *ignores* the middle nodes (since for proc-points they’re irrelevant).
  My change to go “block-at-a-time” allowed us to remove that hack.
- I’m trying to fix non-linearity of `CmmLayoutStack` in
  (https://phabricator.haskell.org/D3586) and again the block-oriented
  interface is useful - I want to do different rewrites based on
  which block is being considered (whether it’s a proc-point or not).
  This is not easily possible if I don’t know which block I’m in (which
  is the case for the node-oriented interface).

I also don’t think that name clashes and the tension between Hoopl’s
interface and GHC are easy to solve. Hoopl is a public, stand-alone
package, so we can’t just change things without considering
compatibility. For instance, we can’t use GHC’s `Unique` in Hoopl. But
should we switch all of GHC to use Hoopl’s? Also having closely related
concepts spread around GHC and Hoopl is not helping when trying to
understand what’s happening. Finally, any changes to both GHC & Hoopl
have much higher overhead than just changing GHC.

In general, it really seems to me that Hoopl has been released simply
too early, with not enough real-world usage and testing. When you say
that we should “just fix Hoopl”, it sounds to me that we’d really need
to rewrite it from scratch. And it’s much easier to do that if we can
just experiment within GHC without worrying about breaking other
existing Hoopl users. Only once we’re happy with the result, we should
be considering separating it into a stand-alone package.

3) Difference between pretty/containers and Hoopl

I also think that the situation with pretty/containers is quite
different than Hoopl. They are much more general-purpose libraries,
*far* more widely used and with more contributors. Take containers - the
package is still very actively developed and constantly improved.
Whereas Hoopl hasn’t really seen much activity in the last 5 years. So
the benefit-cost ratio is much better - yes there is some cost in having
containers as a dependency, but the benefits from the regular stream of
improvements easily outweigh it. I don’t think that’s the case for
Hoopl.

Does this help understand my motivation? Let me know if anything is
still unclear!

Thanks,
Michal

Re: Removing Hoopl dependency?

Michal Terepeta