
Thanks so much for the pointers, Ben.
I opened a ticket here https://ghc.haskell.org/trac/ghc/ticket/15449
On Fri, Jul 27, 2018 at 6:51 AM, Ben Gamari
Travis Whitaker
writes: Hello GHC Devs,
It seems to me that GHC is rather broken on aarch64, at least since 8.2.1 (and at least on the machines I have access to). I first noticed this issue with Nixpkgs (https://github.com/NixOS/nixpkgs/issues/40301), so to check that this isn't some Nixpkgs idiosyncrasy I went ahead and built my own GHC 8.4.3 for aarch64 (there's no binary release at https://www.haskell.org/ghc/download_ghc_8_4_3.html to try, but perhaps I've missed something.
It seems the only Nix idiosyncrasy was passing "--ghc-option=-j${cores}" to "./Setup.hs configure". The issue is triggered by using '-jn' for any n greater than one when building any non-trivial package, but I've found hscolour1.24.4 reproduces it very reliably (perhaps because there are opportunities for parallelism early in its module dependency graph?). GHC very often (although not always) will fail with one of:
- Segmentation fault. - Bus fault - <no location info>: error: ghc: panic! (the 'impossible' happened) (GHC version 8.4.3 for aarch64-unknown-linux): Binary.UserData: no put_binding_name
- ghc: internal error: MUT_VAR_CLEAN object entered! (GHC version 8.4.3 for aarch64_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/ reportabug Aborted (core dumped)
Ugh, that is awful.
The fix, excruciating as it may be on already slow arm machines, is to use '-j1'. This issue seems present on each GHC release since 8.2.1 (although I haven't tried HEAD yet). I haven't noticed any issues with any other concurrent Haskell programs on aarch64.
There are some umbrella bugs for aarch64 in Trac, so I wanted to ask here before filing a ticket. Has anyone else noticed this behavior on aarch64? What's more, are there any tips for using GDB to hunt down synchronization issues in GHC?
Definitely open a new ticket.
The methodology for tracking down issues like this is quite case-specific but I do have some general recommendations: On x86-64 I use rr [1], which is an invaluable tool. Sadly this isn't an option on AArch64 AFAIK. I also have some gdb extensions to take much of the monotony away from inspecting GHC's heap and internal data structures [2]. I've not used them on AArch64 so there may be a few compatibility issues but I suspect they wouldn't be hard to fix.
I know it may be hard in this case but I would at least try to reduce the size of the failing program to something that fits in less than a few hundred lines. Low-level debugging is hard enough when you can keep the program in your head; debugging all of GHC this way is possible but much harder. Given that this appears to be threading-specific, I would also pay particular attention to the GHC and base's use of barriers and atomics. It's possible that we are just missing a barrier somewhere.
Finally, you might quickly try building 8.0 to see whether bisection is a possibility. It would be a slow process, given the speed of the hardware involved, but ultimately it can be much more time efficient once you have it setup since you can replace human debugging time (a very finite commodity) with computation.
Good luck and let us know if you get stuck,
- Ben
[1] http://rr-project.org/ [2] https://github.com/bgamari/ghc-utils/tree/master/gdb