Re: Status of "Improved LLVM backend"

28 Nov 2016

...
On Nov 27, 2016, at 10:17 PM, Michal Terepeta  wrote:
...
Hi,
I’m trying to implement a bitcode producing llvm backend[1], which would potentially
allow to use a range of llvm versions with ghc. However, this is only tangentially
relevant to the improved llvm backend, as Austin correctly pointed out[2], as there are
other complications besides the fragility of the textual representation.
So this is mostly only relevant to the improved ir you mentioned. The bitcode code gen
plugin right now follows mostly the textual ir generation, but tries to prevent the
ubiquitous symbol to i8* casting. The llvm gen turns cmm into ir, at this point however
at that point, the wordsize has been embedded already, which means that the current
textual llvm gen as well as the bitcode llvm gen try to figure out if relative access is
in multiple wordsizes to use llvms getElementPointer.
That sounds interesting, do you know where could I find out more about this?
(both when it comes to the current LLVM codegen and yours)
For the llvm code gen in ghc it’s usually the `_fast` suffix functions. See [1] and
the `genStore_fast` 30 lines further down.  My bitcode llvm gen follows that file [1],
almost identically, as can be seen in [2].  However the `_fast` path is currently
disabled. 

An example of the generated ir for the current llvm backend, and the bitcode backend,
(textual ir, via llvm-dis) can be found in [3] and [4] respectively.
...
...
I don’t know if generating llvm from stg instead of cmm would be a better
approach, which is what ghcjs and eta do as far as I know.
Wouldn't a step from STG to LLVM be much harder (LLVM IR is a pretty low-level
representation compared to STG)? There are also a few passes on the Cmm level
that seem necessary, e.g., `cmmLayoutStack`.
There is certainly a tradeoff between retaining more high-level information and 
having to lower them oneself.  If I remember luite correctly, he said he had a similar
intermediate format to cmm, just not cmm but something richer, which allows
to better target javascript.  The question basically boils down to asking if cmm is
too low-level for llvm already; the embedding of wordsizes is an example where I think
cmm might be to low-level for llvm.

—
[1]: https://github.com/ghc/ghc/blob/master/compiler/llvmGen/LlvmCodeGen/CodeGen....
[2]: https://github.com/angerman/data-bitcode-plugin/blob/master/src/Data/BitCode...
[3]: https://gist.github.com/angerman/32ce9395e73cfea3348fcc7da108cd0a
[4]: https://gist.github.com/angerman/d87db1657aac4e06a0886801aaf44329

Re: Status of "Improved LLVM backend"

Moritz Angermann