
Hi, as presented during the HIW at ICFP 2016, and ICFP 2017, I’ve been spending a some time on an alternative llvm backend. GHC used LLVM as an external tool, and communicates with LLVM via LLVMs Intermediate Representation. For this the LLVM backend in GHC (via `-fllvm`) writes out text files containing the textual LLVM IR, and feeds those into LLVMs the `opt` and `llc` tools to produce the final machine code. The textual IR has been changing quite a bit for each release in the past. But seems to have been quite stable from LLVM3.9 through LLVM4 and now LLVM5. However LLVM also has a binary IR called LLVM Bitcode, which is a very stable format and the one they advise to work against. Sadly, the documentation on it is mostly contained in the BitcodeReader and BitcodeWriter C++ files from the LLVM project. As Cmm hands out only labels for function without function signatures, and the LLVM backend wants to be a good consumer, we assume functions to always be pointers to int8 (i8*), as LLVM expects them to be properly typed. Of course when defining functions we do have the full function signature and also need it to pass arguments. Yet to make i8* assumption work, we create aliases for each function we define, that are of type i8*. On macOS (and iOS) where the mach-o format is used, the linker uses a feature called `subsections_via_symbols` to strip dead code. This works, by assuming that all code between two symbols belongs to the first symbol. GHCs use of Tables Next To Code, and the `prefixdata` feature that was added to llvm just for this purpose however put data in front of symbols, and the linker will strip that data if it determines the previous symbol is not used. Thus we can not strip any code on macOS or iOS produced by GHC. The NCG inserts `$dsp` suffix symbols in front of the TNTC data, and marks those as used, to work around the dead-strip issue. The current llvm backend uses the LLVMManger to strip out the `.symbols_via_subsections` directive from the generated assembly prior to handing it off to the assembler. I therefore sat out to try and see if I could fix some of the issues we ran into. This has lead to building three libraries: - data-bitcode (github.com/angerman/data-bitcode) A bitcode reader/writer. In itself Bitcode is a rather simple encoding format. It is based on sequences of bits as opposed to bytes. And encoded so called blocks (with IDs), that can give meaning to records (think: structs) that are in those blocks. It also comes with a compression mechanism, where one can define so called abbreviated records (think: packed structs). - data-bitcode-llvm (github.com/angerman/data-bitcode-llvm) A package to model LLVM modules, and lower them into the bitcode AST - data-bitcode-edsl (github.com/angerman/data-bitcode-edsl) A Haskell EDSL, that allows to construct LLVM modules. E.g. testModule :: Module testModule = mod "undef" [ def "main" ([i32, ptr =<< i8ptr] --> i32) $ \[ argc, argv ] -> mdo block "entry" $ do mem <- undef =<< (arr 10 =<< i8) memG <- global (mutable . private) "mem" mem ptr <- gep memG =<< sequence [int32 0, int32 0] memset <- fun "llvm.memset.p0i8.i32" =<< [i8ptr, i8, i32, i32, i1] --> void ccall memset =<< (ptr:) <$> sequence [ int8 0, int32 10, int32 4, int 1 0 ] ret =<< int32 0 pure () ] More examples can be found in github.com/angerman/data-bitcode-edsl/blob/master/test/EDSLSpec.hs With this in hand, I went a head and ported the existing llvm code gen to use the EDSL instead of concatenating strings. After this introduction, I’m now pleased to inform you that the `llvmng` backend now passes fast and slow validation, with the exception of the peculiar case of T6084 (ghc.haskell.org/t/6084) where the callee and caller signatures do not match up, and this causes the `llvmng` backend to topple over. Another note is in order: the llvmng backend is currently quite a bit more memory hungry and time consuming than the current llvm backend for non trivial modules. We do get dead-strippable binaries through (for `main = putStrLn “hello world”`): 1.2M Main8.2-llvm 972k MainHEAD-llvmng The relevant code can be found in github.com/zw3rk/ghc/tree/llvm-ng Cheers, Moritz Validation Results Below: fast validation: Unexpected results from: TEST="MultiLayerModules T12707 T13379 T13701 T13719 T6084 T9630" SUMMARY for test run started at Tue Sep 19 12:53:34 2017 +08 0:08:06 spent to go through 6077 total tests, which gave rise to 23781 test cases, of which 16423 were skipped 17 had missing libraries 2392 expected passes 32 expected failures 0 caused framework failures 0 caused framework warnings 0 unexpected passes 1 unexpected failures 6 unexpected stat failures Unexpected failures: codeGen/should_run/T6084.run T6084 [exit code non-0] (llvmng) Unexpected stat failures: perf/compiler/T13379.run T13379 [stat not good enough] (llvmng) perf/compiler/T9630.run T9630 [stat not good enough] (llvmng) perf/compiler/T12707.run T12707 [stat not good enough] (llvmng) perf/compiler/MultiLayerModules.run MultiLayerModules [stat not good enough] (llvmng) perf/compiler/T13719.run T13719 [stat not good enough] (llvmng) perf/compiler/T13701.run T13701 [stat not good enough] (llvmng) slow validation: Unexpected results from: TEST="MultiLayerModules T12707 T13379 T13701 T13719 T6084 T9630" SUMMARY for test run started at Tue Sep 19 13:14:18 2017 +08 0:28:57 spent to go through 6077 total tests, which gave rise to 23908 test cases, of which 16302 were skipped 69 had missing libraries 6223 expected passes 72 expected failures 0 caused framework failures 0 caused framework warnings 0 unexpected passes 3 unexpected failures 12 unexpected stat failures Unexpected failures: codeGen/should_run/T6084.run T6084 [exit code non-0] (llvmng) codeGen/should_run/T6084.run T6084 [exit code non-0] (llvmng) codeGen/should_run/T6084.run T6084 [exit code non-0] (llvmng) Unexpected stat failures: perf/compiler/T13379.run T13379 [stat not good enough] (llvmng) perf/compiler/T13379.run T13379 [stat not good enough] (llvmng) perf/compiler/T9630.run T9630 [stat not good enough] (llvmng) perf/compiler/T12707.run T12707 [stat not good enough] (llvmng) perf/compiler/T9630.run T9630 [stat not good enough] (llvmng) perf/compiler/T12707.run T12707 [stat not good enough] (llvmng) perf/compiler/T13719.run T13719 [stat not good enough] (llvmng) perf/compiler/MultiLayerModules.run MultiLayerModules [stat not good enough] (llvmng) perf/compiler/T13701.run T13701 [stat not good enough] (llvmng) perf/compiler/T13719.run T13719 [stat not good enough] (llvmng) perf/compiler/MultiLayerModules.run MultiLayerModules [stat not good enough] (llvmng) perf/compiler/T13701.run T13701 [stat not good enough] (llvmng)