[GHC] #16058: GHC built on macOS Mojave nondeterministically segfaults

#16058: GHC built on macOS Mojave nondeterministically segfaults -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.6.3 Keywords: | Operating System: MacOS X Architecture: x86_64 | Type of failure: Building GHC (amd64) | failed Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- I am seeing perhaps a quarter of GHC builds on the GitLab Darwin builder (running Mojave) fail with a segfault in `ghc-stage1`. Unfortunately I've been entirely unable to reproduce this in a debugger (which isn't surprising given the low probability of the crash manifesting). Given that we haven't seen this in any Darwin builds until now I'm suspicious of either the XCode toolchain or Mojave itself. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16058 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16058: GHC built on macOS Mojave nondeterministically segfaults -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.6.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Building GHC | (amd64) failed | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): One such crash https://gitlab.staging.haskell.org/ghc/ghc/-/jobs/1857. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16058#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16058: GHC built on macOS Mojave nondeterministically segfaults -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.6.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Building GHC | (amd64) failed | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): For the record I have seen this while using both `gcc` and Apple's `clang`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16058#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16058: GHC built on macOS Mojave nondeterministically segfaults -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.6.3 Resolution: | Keywords: Operating System: MacOS X | Architecture: x86_64 Type of failure: Building GHC | (amd64) failed | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by gwynne): This crashing is caused by the incorrect practice of loading static archives directly into memory. The `align` field of the various sections in the `__TEXT` segment is not respected; instead the entire file is mapped directly in using `mmap()`, which results in sections often loading at addresses which are only 8-byte aligned. The files on disk are not properly aligned, since `ld` was never given the opportunity to correctly arrange them. In turn, this causes any SSE instructions which load from memory (as found in, especially, crypto algorithms implemented in C) to raise `#GP` faults - SSE memory loads require 16-byte alignment. It's essentially blind luck that this has been working for any length of time in the past. This technique of loading static archives directly into memory at runtime is problematic at best; this crash is far from the only issue. With no involvement by `ld` and `dyld`, no symbols or debug information were available. There was absolutely no evidence anywhere in the crash logs to even make a start at tracking down the issue; only the presence of a consistent repro case and a lot of examining memory regions by hand made finding the root cause possible. `MH_OBJECT` files are not intended to be treated as final executable code. And to add insult to injury, the technique is fundamentally incompatible with code signing. Re-implementing parts of `dyld` by hand like this is not a replacement for macOS not supporting static linking. In the absence of switching to a supported behavior (e.g. dynamic loading), an appropriate fix consists of: 1. Disable the `mmap()` codepath in the Mach-O loader. It prevents correct alignment handling and does not provide a significant performance benefit. 2. Teach the loader to handle alignment on a section-by-section (''not'' segment!) basis and to correctly apply the appropriate relocations. The existing "align the entire file" codepath is not sufficient. 3. Remove the conceit that there will be only one segment to load. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16058#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC