Michael

Sorry to be slow.

Note that what I’m actually advocating is to *finish* forking Hoopl. The

fork really started in ~2012 when the “new Cmm backend” was being

finished.

Yes, I know. But what I’m suggesting is to revisit the reasons for that fork, and re-join if possible. Eg if Hoopl is too slow, can’t we make it faster? Why is GHC’s version faster?

apart from the performance

(as noted above), there’s the issue of Hoopl’s interface. IMHO the

node-oriented approach taken by Hoopl is both not flexible enough and it

makes it harder to optimize it. That’s why I’ve already changed GHC’s

`Hoopl.Dataflow` module to operate “block-at-a-time”

Well that sounds like an argument to re-engineer Hoopl’s API, rather an argument to fork it. If it’s a better API, can’t we make it better for everyone? I don’t yet understand what the “block-oriented” API is, or how it differs, but let’s have the conversation.

When you say

that we should “just fix Hoopl”, it sounds to me that we’d really need

to rewrite it from scratch. And it’s much easier to do that if we can

just experiment within GHC without worrying about breaking other

existing Hoopl users

Fine. But then let’s call it hoopl2, make it a separate package (perhaps with GHC as its only client for now), and declare that it’s intended to supersede hoopl.

But do we even need to do that much? After all, a major version bump on a package is allowed to introduce breaking changes to the API. Anyone who wants the old API can use the old package.

I wonder if you could start a wiki page somewhere (eg on the GHC wiki) listing all the changes you’d like to make in a “rewrite from scratch” story? That would help to “ground” the conversation.

Thanks

Simon

From: Michal Terepeta [mailto:michal.terepeta@gmail.com]
Sent: 29 May 2017 12:53
To: Simon Peyton Jones <simonpj@microsoft.com>; ghc-devs <ghc-devs@haskell.org>
Subject: Re: Removing Hoopl dependency?

On Sun, May 28, 2017 at 11:30 PM Simon Peyton Jones <simonpj@microsoft.com> wrote:

Is there really a compelling case for forking Hoopl? I was talking to Kavon last week about doing exactly the opposite: using Hoopl more wholeheartedly!

Before going ahead with this, let’s remember the downsides

·        If we fork Hoopl, improvements in one place will not be seen in the other. GHC originally used its own containers library but now uses ‘containers’, most of which is irrelevant to GHC, just to pick up the work that has been done to make ‘containers’ fast. Similarly, GHC has a clone of ‘pretty’, but someone is working (I think) to make GHC use ‘pretty’.

·        It’s not clear to me why GHC has a clone of parts of Hoopl. Would it not be better just to make Hoopl faster?

If anything I ‘d like to use Hoopl more in Cmm optimisation passes in GHC, so we may want to use more of Hoopl’s facilities.

The main reason you suggest for forking is that there are some awkward name clashes. Surely we could resolve these? e.g we could change CLabel in GHC; or agree with Hoopl maintainers that BlockId would be more helpful than Label.

You mention that Hoopl uses Unique set/map. Why not use ‘containers’ for that? (Like GHC!)

Let’s discuss this a bit more before executing

I’m also interested to know:

·        who is actively working on Hoopl (Michael, Sophie, …)?

·        how are you using it (within GHC, or somewhere else)?

It’d be good to review and update https://ghc.haskell.org/trac/ghc/wiki/Hoopl/Cleanup. Are there any other improvements planned?

Simon

Hi Simon,

Thanks for chiming in! Let me try to clarify the current situation and

the motivation for my changes.

1) Initial fork of Hoopl

Note that what I’m actually advocating is to *finish* forking Hoopl. The

fork really started in ~2012 when the “new Cmm backend” was being

finished.

IIRC the main reason was the unacceptable performance and it seems that

even Simon Marlow had trouble making it run fast enough:

https://plus.google.com/107890464054636586545/posts/dBbewpRfw6R

https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/HooplPerformance

The end result is pretty sad: GHC has its own forked/specialized

`Hoopl.Dataflow` module and is using Hoopl only for definitions of

`Block`/`Graph` and maps/sets (if you look at my commit, it’s pretty

clear what I’m copying). In particular it’s not using *any* of dataflow

analysis or rewriting capabilities of the Hoopl package.

2) Reasons to finish forking

The reasons I listed in my previous email already assumed the we have

the forked `Hoopl.Dataflow` module in GHC. But if we want to discuss

what are reasons for forking in general, then apart from the performance

(as noted above), there’s the issue of Hoopl’s interface. IMHO the

node-oriented approach taken by Hoopl is both not flexible enough and it

makes it harder to optimize it. That’s why I’ve already changed GHC’s

`Hoopl.Dataflow` module to operate “block-at-a-time”

(https://github.com/ghc/ghc/commit/679ccd1c8860f1ef4b589c9593b74d04c97ae836)

Some concrete examples:

- For proc-point analysis it was necessary to introduce a hack to GHC’s

`Dataflow` module to expose a separate analysis function that

*ignores* the middle nodes (since for proc-points they’re irrelevant).

My change to go “block-at-a-time” allowed us to remove that hack.

- I’m trying to fix non-linearity of `CmmLayoutStack` in

(https://phabricator.haskell.org/D3586) and again the block-oriented

interface is useful - I want to do different rewrites based on

which block is being considered (whether it’s a proc-point or not).

This is not easily possible if I don’t know which block I’m in (which

is the case for the node-oriented interface).

I also don’t think that name clashes and the tension between Hoopl’s

interface and GHC are easy to solve. Hoopl is a public, stand-alone

package, so we can’t just change things without considering

compatibility. For instance, we can’t use GHC’s `Unique` in Hoopl. But

should we switch all of GHC to use Hoopl’s? Also having closely related

concepts spread around GHC and Hoopl is not helping when trying to

understand what’s happening. Finally, any changes to both GHC & Hoopl

have much higher overhead than just changing GHC.

In general, it really seems to me that Hoopl has been released simply

too early, with not enough real-world usage and testing. When you say

that we should “just fix Hoopl”, it sounds to me that we’d really need

to rewrite it from scratch. And it’s much easier to do that if we can

just experiment within GHC without worrying about breaking other

existing Hoopl users. Only once we’re happy with the result, we should

be considering separating it into a stand-alone package.

3) Difference between pretty/containers and Hoopl

I also think that the situation with pretty/containers is quite

different than Hoopl. They are much more general-purpose libraries,

*far* more widely used and with more contributors. Take containers - the

package is still very actively developed and constantly improved.

Whereas Hoopl hasn’t really seen much activity in the last 5 years. So

the benefit-cost ratio is much better - yes there is some cost in having

containers as a dependency, but the benefits from the regular stream of

improvements easily outweigh it. I don’t think that’s the case for

Hoopl.

Does this help understand my motivation? Let me know if anything is

still unclear!

Thanks,

Michal