code generation and backends Re: Design discussion for atomic primops to land in 7.8

Austin also raises an important point:
Its high time we explore changes to the ABI/per architecture calling
convention and how they might impact performance.
I think this is something worth exploring after the 7.8 release
(but not in the next month of prerelease engineering, though I need to suss
out some of that a bit more with the parties who care)
I had started working on a backwards compatible ABI change for llvm a few
months back that I could easily get LLVM folks to merge in, but I'd rather
only give the LLVM folks 1 set of ABI change patches within a 6month
period, where we've had the time to experiment with breaking changes as an
option too, rather than given them a partial patch now, and then a breaking
changes calling convention patch a few months from now.
(because if we don't bundle LLVM with GHC, we need to actually tie ABI
versions to LLVM major version releases, which isn't a problem, but just
requires thoughtful cooperation and such)
cheers
-Carter
On Tue, Aug 27, 2013 at 12:51 PM, Austin Seipp
To do this, IMO we'd also really have to start shipping our own copy of LLVM. The current situation (use what we have configured or in $PATH) won't really become feasible later on.
On platforms like ARM where there is no NCG, the mismatches can become super painful, and it makes depending on certain features of the IR or compiler toolchain (like an advanced, ISA-aware vectorizer in LLVM 3.3+) way more difficult, aside from being a management nightmare.
Fixing it does require taking a hit on things like build times, though. Or we could use binary releases, but we occasionally may want to tweak and/or fix things. If we ship our own LLVM for example, it's reasonable to assume sometime in the future we'll want to change the ABI during a release.
This does bring other benefits. Max Bolingbroke had an old alias analysis plugin for LLVM that made a noticeable improvement on certain kinds of programs, but shipping it against an arbitrary LLVM is infeasible. Stuff like this could now be possible too.
In a way, I think there's some merit to having a simple, integrated code generator that does the correct thing, with a high performance option as we have now. LLVM is a huge project, and there's definitely some part of me that thinks this may not lower our complexity budget as much as we think, only shift parts of it around ('second rate' platforms like PPC/ARM expose way more bugs in my experience, and tracking them across such a massive surface area can be quite difficult.) It's very stable and well tested, but an unequivocal dependency on hundreds of thousands of lines of deeply complex code is a big question no matter what.
But, the current NCG isn't that 'simple correct thing' either, though. I think it's easily one of the least understood parts of the compiler with a long history, it's rarely refactored or modified (very unlike other parts,) and it's maintained only as necessary. Which doesn't bode well for its future in any case.
On 26/08/13 08:17, Ben Lippmeier wrote:
> Well, what's the long term plan? Is the LLVM backend going to become the only backend at some point?
I wouldn't argue against ditching the NCG entirely. It's hard to justify fixing NCG performance problems when fixing them won't make the NCG faster than LLVM, and everyone uses LLVM anyway.
We're going to need more and more SIMD support when processors supporting the Larrabee New Instructions (LRBni) appear on people's desks. At that time there still won't be a good enough reason to implement those instructions in the NCG.
I hope to implement SIMD support for the native code gen soon. It's not a huge task and having feature parity between LLVM and NCG would be good.
Will you also update the SIMD support, register allocators, and calling conventions in 2015 when AVX-512 lands on the desktop? On all supported platforms? What about support for the x86 vcompress and vexpand instructions with mask registers? What about when someone finally asks for packed conversions between 16xWord8s and 16xFloat32s where you need to split the result into four separate registers? LLVM does that automatically.
I've been down this path before. In 2007 I implemented a separate graph colouring register allocator in the NCG to supposably improve GHC's numeric performance, but the LLVM backend subsumed that work and now having two separate register allocators is more of a maintenance burden than a help to anyone. At the time, LLVM was just becoming well known, so it wasn't obvious that implementing a new register allocator was a largely a redundant piece of work -- but I think it's clear now. I was happy to work on the project at the time, and I learned a lot from it, but when starting new projects now I also try to imagine the system that will replace the one I'm dreaming of.
Of course, you should do what interests you -- I'm just pointing out a strategic consideration.
The existence of LLVM is definitely an argument not to put any more effort into backend optimisation in GHC, at least for those optimisations that LLVM can already do.
But as for whether the NCG is needed at all - there are a few ways that
LLVM backend needs to be improved before it can be considered to be a complete replacement for the NCG:
1. Compilation speed. LLVM approximately doubles compilation time. Avoiding going via the textual intermediate syntax would probably help here.
2. Shared library support (#4210, #5786). It works (or worked?) on a couple of platforms. But even on those platforms it generated worse code than
NCG due to using dynamic references for *all* symbols, whereas the NCG knows which symbols live in a separate package and need to use dynamic references.
3. Some low-level optimisation problems (#4308, #5567). The LLVM backend generates bad code for certain critical bits of the runtime, perhaps due to lack of good aliasing information. This hasn't been revisited in the
On Mon, Aug 26, 2013 at 3:19 PM, Simon Marlow
wrote: the the light of the new codegen, so perhaps it's better now.
Someone should benchmark the LLVM backend against the NCG with new codegen in GHC 7.8. It's possible that the new codegen is getting a slight boost because it doesn't have to split up proc points, so it can do better code generation for let-no-escapes. (It's also possible that LLVM is being penalised a bit for the same reason - I spent more time peering at NCG-generated code than LLVM-generated code).
These are some good places to start if you want to see GHC drop the NCG.
Cheers, Simon
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin - PGP: 4096R/0x91384671
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
participants (1)
-
Carter Schonwald