[GHC] #14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime crash Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- As observed [https://github.com/ekmett/lens/issues/781#issuecomment-357841481 here], any attempt to run [https://hackage.haskell.org/package/doctest-0.13.0 doctest] on a module that contains an `ANN` will result in a segfault—but only under certain settings! To explain better what I mean, let's look at a stripped-down version of `doctest`: {{{#!hs -- Bug.hs module Main (main) where import Control.Applicative ((<|>)) import Control.Monad.IO.Class (liftIO) import Data.Char (isSpace) import Data.List (dropWhileEnd) import Digraph (flattenSCCs) import GHC (depanal, getSessionDynFlags, guessTarget, loadModule, noLoc, parseDynamicFlags, parseModule, runGhc, setSessionDynFlags, setTargets, topSortModuleGraph, typecheckModule) import System.Directory (findExecutable) import System.Process (readProcess) getLibDir :: IO FilePath getLibDir = do Just ghcPath <- findExecutable "ghc" <|> findExecutable "ghc-stage2" dropWhileEnd isSpace <$> readProcess ghcPath ["--print-libdir"] "" main :: IO () main = do libdir <- getLibDir putStrLn libdir runGhc (Just libdir) $ do (dynflags, _, _) <- getSessionDynFlags >>= flip parseDynamicFlags (map noLoc ["-package base"]) _ <- setSessionDynFlags dynflags mapM (`guessTarget` Nothing) ["Foo.hs"] >>= setTargets mods <- depanal [] False let sortedMods = flattenSCCs (topSortModuleGraph False mods Nothing) let f theMod = do liftIO $ putStrLn "Before parseModule" m1 <- parseModule theMod liftIO $ putStrLn "Before typecheckModule" m2 <- typecheckModule m1 liftIO $ putStrLn "Before loadModule" m3 <- loadModule m2 liftIO $ putStrLn "After loadModule" return m3 mods' <- mapM f sortedMods mods' `seq` return () }}} As well as a module with an `ANN`: {{{#!hs module Foo where {-# ANN module "I'm an annotation" #-} }}} If you attempt to compile and run `Bug.hs` with GHC 8.2.2, everything is fine and dandy: {{{ $ PATH=/opt/ghc/8.2.2/bin:$PATH ghc -fforce-recomp Bug.hs -package ghc [1 of 1] Compiling Main ( Bug.hs, Bug.o ) Linking Bug ... $ PATH=/opt/ghc/8.2.2/bin:$PATH ./Bug /opt/ghc/8.2.2/lib/ghc-8.2.2 Before parseModule Before typecheckModule Before loadModule After loadModule }}} But if these two criteria are met: * You're using GHC 8.4.1-alpha * You're using Ubuntu 16.04 or later Then this will result in a segfault! {{{ $ PATH=/opt/ghc/8.4.1/bin:$PATH ghc -fforce-recomp Bug.hs -package ghc [1 of 1] Compiling Main ( Bug.hs, Bug.o ) Linking Bug ... $ PATH=/opt/ghc/8.4.1/bin:$PATH ./Bug /opt/ghc/8.4.1/lib/ghc-8.4.0.20171222 Before parseModule Before typecheckModule Segmentation fault (core dumped) $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 17.04 Release: 17.04 Codename: zesty }}} The second criteria about Ubuntu version is the most baffling part, but the segfault does not appear to occur when I try it on, for instance, an Ubuntu 14.04 machine: {{{ $ PATH=/opt/ghc/8.4.1/bin:$PATH ghc -fforce-recomp Bug.hs -package ghc [1 of 1] Compiling Main ( Bug.hs, Bug.o ) Linking Bug ... $ PATH=/opt/ghc/8.4.1/bin:$PATH ./Bug /opt/ghc/8.4.1/lib/ghc-8.4.0.20171222 Before parseModule Before typecheckModule Before loadModule After loadModule $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04.5 LTS Release: 14.04 Codename: trusty }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by RyanGlScott): * cc: bgamari (added) Comment: This regression was introduced in commit 625143f473b58d770d2515b91c2566b52d35a4c3 (`configure: Coerce gcc to use $LD instead of system default`). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Thanks, as always, for the bisection Ryan; it's incredibly helpful. I wonder if this is related to the `gold` bug that I found while writing that patch (see #13883). It seems plausible that constructor mis-ordering would result in a crash. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * related: => #14603 Comment: I believe I saw this previously with #14603. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by alpmestan): I started taking a look at this. First, I haven't been able to reproduce the issue on NixOS. So I started toying around in an Ubuntu 16.04 virtual machine, on which I installed ghc 8.2.2 and 8.4.1, both from hvr's PPA. And I can indeed reproduce the segmentation fault, only with 8.4.1 just like you. However, I started thinking that it might be nice to be able to use gdb to see exactly what code is being executed when the segfault happens. Then I built both the same commit from which hvr's ghc-8.4.1 was built as well as the tip of the 8.4.1 alpha1 branch, both with the quick flavour, the --enable-dwarf-unwind configure option and -g3 for the libs and the RTS. Now, for the fun part: I have _not_ been able to reproduce the bug with those builds of GHC. I'll start looking into the Debian build recipes used by hvr's PPA, with the hope of being able to build a gdb-friendly clone of hvr's 8.4.1 build that does have the bug but also allows me to look around right before it happens. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: alpmestan Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by alpmestan): * owner: (none) => alpmestan -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: alpmestan Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by RyanGlScott): I wonder if the version of `ld.gold` has something to do with this? On Ubuntu 14.04, I have: {{{ $ ld.gold --version GNU gold (GNU Binutils for Ubuntu 2.24) 1.11 Copyright 2013 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License version 3 or (at your option) a later version. This program has absolutely no warranty. }}} But on Ubuntu 16.04 and later, I have: {{{ $ ld.gold --version GNU gold (GNU Binutils for Ubuntu 2.28) 1.14 Copyright (C) 2017 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License version 3 or (at your option) a later version. This program has absolutely no warranty. }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: alpmestan Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by alpmestan): Ryan: I don't think so. In my VM I have ld.gold version 1.11, which I assume is used by both the 8.2.2 and 8.4.1 builds from hvr's PPA, as well as my custom 8.4.1 build. 8.2.2 and my custom 8.4.1 build don't have the issue while hvr's 8.4.1 does have the problem. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: alpmestan Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by RyanGlScott): Well, you can check what `"ld command"` is on the respective `settings` files (e.g., `/opt/ghc/8.2.2/lib/ghc-8.2.2/settings` and `/opt/ghc/8.4.1/lib/ghc-8.4.0.20171222/settings`) to be sure (in my case, it's `ld.gold` for both). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: alpmestan Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by alpmestan): Here's what I have: {{{#!haskell alp@vm:~/ghc$ cat /opt/ghc/8.2.2/lib/ghc-8.2.2/settings | grep ld ("C compiler link flags", " -fuse-ld=gold"), ("ld command", "ld.gold"), ("ld flags", ""), ("ld supports compact unwind", "YES"), ("ld supports build-id", "YES"), ("ld supports filelist", "NO"), ("ld is GNU ld", "YES"), alp@vm:~/ghc$ cat /opt/ghc/8.4.1/lib/ghc-8.4.0.20171222/settings | grep ld ("C compiler link flags", " -fuse-ld=gold"), ("ld command", "ld.gold"), ("ld flags", ""), ("ld supports compact unwind", "YES"), ("ld supports build-id", "YES"), ("ld supports filelist", "NO"), ("ld is GNU ld", "YES"), }}} And my custom build reports the same thing as well. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: alpmestan Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by RyanGlScott): Ah, darn, never mind then :) I'm not sure if there's an easier way to pinpoint differences between individual components of Ubuntu 14.04 vs. 16.04 that might be contributing to this... perhaps Nix makes this easier? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: alpmestan Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by alpmestan): Alright, I did another build, `perf` flavour instead of `quick` previously, using `./configure --with-ghc=/opt/ghc/8.2.2/bin/ghc --prefix=/opt/ghc/my-8.4.1/bin/ghc --enable-dwarf-unwind` along with `GhcLibHcOpts += -g3` and `GhcRtsHcOpts += -g3`, so that I can use gdb to figure out where exactly things are going wrong, if I can reproduce the crash. Also from the same commit as hvr's build: {{{#!haskell alp@vm:~/14675$ /home/alp/ghc/inplace/bin/ghc-stage2 --info | grep commit ,("Project Git commit id","c6cf13ca63f3a11a8da7c7e3bd69e673a8df5440") alp@vm:~/14675$ /opt/ghc/8.4.1/bin/ghc --info | grep commit ,("Project Git commit id","c6cf13ca63f3a11a8da7c7e3bd69e673a8df5440") }}} And with this precise setup, even though for some reason I can't `make install` this build, I can reproduce the problem. {{{#!bash alp@vm:~/14675$ /home/alp/ghc/inplace/bin/ghc-stage2 -fforce-recomp -package ghc Bug.hs [1 of 1] Compiling Main ( Bug.hs, Bug.o ) Linking Bug ... alp@vm:~/14675$ PATH=/home/alp/ghc/inplace/bin:$PATH ./Bug /home/alp/ghc/inplace/lib Using binary package database: /home/alp/ghc/inplace/lib/package.conf.d/package.cache package flags [-package base{package base True ([])}] loading package database /home/alp/ghc/inplace/lib/package.conf.d wired-in package ghc-prim mapped to ghc-prim-0.5.2.0 wired-in package integer-gmp mapped to integer-gmp-1.0.1.0 wired-in package base mapped to base-4.11.0.0 wired-in package rts mapped to rts wired-in package template-haskell mapped to template-haskell-2.13.0.0 wired-in package ghc mapped to ghc-8.4.0.20171222 wired-in package dph-seq not found. wired-in package dph-par not found. *** Chasing dependencies: Chasing modules from: *Foo.hs !!! Chasing dependencies: finished in 0.96 milliseconds, allocated 0.400 megabytes Before parseModule *** Parser [Foo]: !!! Parser [Foo]: finished in 0.08 milliseconds, allocated 0.070 megabytes Before typecheckModule *** Renamer/typechecker [Foo]: *** Simplify [expr]: !!! Simplify [expr]: finished in 0.07 milliseconds, allocated 0.000 megabytes *** CorePrep [expr]: !!! CorePrep [expr]: finished in 1.37 milliseconds, allocated 1.666 megabytes *** ByteCodeGen [Ghci1]: !!! ByteCodeGen [Ghci1]: finished in 0.10 milliseconds, allocated 0.047 megabytes Loading package ghc-prim-0.5.2.0 ... linking ... done. *** gcc: gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE '-fuse-ld=gold' -B/home/alp/ghc/libraries/integer-gmp/dist-install/build --print-file-name libgmp.so Loading package integer-gmp-1.0.1.0 ... linking ... done. Loading package base-4.11.0.0 ... linking ... done. Erreur de segmentation (core dumped) }}} Tomorrow I'll fire up gdb and see if this builds lets me figure out precisely where things are going wrong. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: alpmestan Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by alpmestan): I ended up chasing down the problem with `gdb` a little bit but didn't really know where to break and stepping through everything was just not conceivable so I ended up going about it the good old way, by printing a bunch of things along the code path that leads to the error. Long story short, it looks like the `unsafeCoerce#` from `compiler/typecheck/TcSplice.hs:convertAnnotationWrapper` is the cause of the segfault. And the content of the annotation doesn't seem to matter, even `{-# ANN module () #-}` makes the program crash. You can see [https://gist.github.com/alpmestan/145fc5783f00bab9214b7302418aef49 here] my variant of `convertAnnotationWrapper` that prints a bunch of things all along, as well as the relevant section of the output (the "trace"). The program never makes it to "unsafeCoerce# went fine". I will next try to look into what the value we're coercing actually represents and why it's not what we expect. If anyone has any tips for figuring that out quickly, I'm all ears :) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: alpmestan Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari):
I ended up chasing down the problem with gdb a little bit but didn't really know where to break and stepping through everything was just not conceivable so I ended up going about it the good old way, by printing a bunch of things along the code path that leads to the error.
Great work so far! Indeed this is often the case; `gdb` can only get you so far. I should mention that [[http://rr-project.org/|rr]] is an extremely helpful tool in cases like this since it gives you multiple changes to trace through a given run of the program. I also have a `breakpoint` [[https://github.com/bgamari/breakpoint|package]] which makes it easy to insert debugger breakpoints into a Haskell program. While using the package in GHC may be more work than it's worth, stealing the idea should be straightforward. You also may want to try using my [[https://github.com/bgamari/ghc-utils/tree/master/gdb|gdb plugin]]. It makes inspecting the STG stack and heap significantly easier. Out of curiosity, is the issue still reproducible when GHC is built with profiling enabled? If so, it might be nice to continue debugging under this way as the profiled way includes a significant amount of information about the structure of heap objects that can be useful while debugging (e.g. type names). At this point I would do the following: 1. Insert a breakpoint right before the `unsafeCoerce#` 2. Look at the assembler that results and try to work out which register `annotation_wrapper` ends up in 3. Reproduce the issue in the debugging, look at this register, and look at its info table 4. Verify that the info table corresponds to either a thunk or a constructor of the expected type After that the decision tree will branch. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: alpmestan Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by alpmestan): I can reproduce with a `prof`-flavoured GHC, yes. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: alpmestan Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by alpmestan): I made some more progress on this. It turns out that when not using an external interpreter -- like in the program from this ticket -- we run/evaluate annotations with the same runtime system that runs the program. I set a breakpoint on the evaluation of the `unsafeCoerce# ...` expression and tried to follow the execution along in gdb and in the source code of the RTS. I first saw some `Data.Data` related symbol/closure and was the led into the scheduler to later land in `rts/Interpreter.c:interpretBCO`. That function successfully interprets a few opcodes but at some point the `switch` takes the `default` branch: {{{#!c default: barf("interpretBCO: unknown or unimplemented opcode %d", (int)(bci & 0xFF)); }}} with `bci & 0xFF` equal to 32. In other words, the bytecode object interpreter gives up and we would expect the program to crash at that point, with the given error message. Except that it looks like even if we're not using an external interpreter, this does _not_ shut down the runtime system. Instead, the RTS happily goes back to compiling our module, as if the BCO interpreter had completed successfully. This unsurprisingly causes a problem just further down the road, when we're actually trying to read the result of the BCO interpreter as a Haskell value of type `AnnotationWrapper`. I have documented this unexpected behaviour in #14736, but this is not the end of the story as far as this ticket is concerned. Also, Ben pointed out to me that: {{{#!c #define bci_PUSH_APPLY_PP 32 }}} So it looks to me like we _should_ be able handle that opcode... Anyway, now that I know a lot more about what's going on, I'll set some more interesting breakpoints tomorrow, e.g on `interpretBCO`, and try to trace exactly what the interpreter reads. Similarly, I would like to see if I can pinpoint where exactly we "produce" the input to the interpreter (that we `unsafeCoerce#` to an `AnnotationWrapper`). I should eventually converge on the problem. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs
on Ubuntu 16.04 or later
-------------------------------------+-------------------------------------
Reporter: RyanGlScott | Owner: alpmestan
Type: bug | Status: new
Priority: highest | Milestone: 8.4.1
Component: GHC API | Version: 8.4.1-alpha1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: Runtime crash | Test Case:
Blocked By: | Blocking:
Related Tickets: #14603 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by alpmestan):
Below is a summary of how things go in `interpretBCO`.
Here's how things look when we enter that function:
{{{#!c
(gdb) x/8a $rbp
0x7fffffffe0f0: 0x7fffffffe140 0x2cc3186

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: alpmestan Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by alpmestan): When work resumes after going back to the scheduler, we load the stack and jump to its top (`stg_enter_info`). This then leads us to execute a whoooooooole bunch of code. I saw many kinds of closure types along the way, among which a `Data.Data` related one (expected: we embed a `Data` dictionary in `AnnotationWrapper`), a ghci `BreakArray` related one, partial applications, `FUN_STATIC`, etc. The somewhat uninformative context in which the crash finally happens can be seen [https://gist.github.com/alpmestan/a029e46c9cb7beb6839c6ac6058a95f9#file- foo-asm-L1017 here], interleaved with my attempts to find out a bit of information about the addresses/closures/etc randomly in `rbx`, `rax`, printing the top of the stack, etc. I was honestly hoping that the context would be a little more informative. However, all hope is not lost. I'm going to use `rr` as suggested by Ben to record the run, and then I'll be able to go to the crash, look at the address we're trying to jump to, and use `reverse-cont` to see who wrote that address there. This however requires me to set up something else than VirtualBox as `rr` needs hardware performance counters that VirtualBox does not support. I got started with this today. In parallel, I might keep doing a few experiments in `gdb` in my existing (VirtualBox) VM. For instance, when looking at the `TcSplice` code yesterday, I saw a few `do` blocks with some important code (as far as annotation handling is concerned) that I haven't looked closely at yet. In particular, I'm not sure whether the code that produces `zonked_wrapped_expr'` could be the problem. I have a reason or two to believe that it's an interesting place to look at, like the fact that I did see some `Data.Data` related symbol _after_ `interpretBCO`, and these few lines of code are where we produce the `Data.Data` dictionary apparently. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by alpmestan): * owner: alpmestan => bgamari Comment: Handing this off to Ben as I have not (yet) been able to create a virtual machine that provides the hardware performance counters needed by `rr`, which really is the only sane way to figure out how we end up jumping to an invalid address. Off to you, Ben. :) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by lelf): * cc: lelf (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I have done a bit of characterisation of the issue since I fear we simply don't have time at this point to come up with a proper solution for 17.10. ||= Distribution =||= binutils version =||= glibc version =||= Affected =|| || Ubuntu Wily (15.10) || || || || || Ubuntu Xenial (16.04) || 2.26 || 2.23 || Yes || || Ubuntu Zesty (17.04) || 2.28 || 2.24 || No || || Debian Jessie (9.0) || 2.28 || 2.24 || No || || Ubuntu Artful (17.10) || 2.29 || 2.26 || No || -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I should mention that there are a few crashes which I've seen in the past that may be related. Specifically, #14291 and #14705. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): Ok, I think I have a bit more info on this, and a better workaround. For me the crash happened here: {{{ (gdb) disassemble 0x0000000040792900,0x0000000040792910 Dump of assembler code from 0x40792900 to 0x40792910: 0x0000000040792900: mov 0x10(%rbp),%rax 0x0000000040792904: mov %rbx,%rcx 0x0000000040792907: and $0x7,%ecx => 0x000000004079290a: jmpq *0x403ff568(,%rcx,8) }}} and the memory at 0x403ff568 looks bogus. By using `+RTS -Dl` and digging through the logs, correlating this with `objdump --reloc HSbase-4.11.0.0.o`, I got that this relocation is: {{{ 00000000004228cd R_X86_64_32S .rodata-0x0000000000451cd8 }}} which is utterly bogus: we should never have a negative offset into a section. The original object file, before it was squashed into `HSbase-4.11.0.0.o` has this: {{{ 0000000000000b3d R_X86_64_32S .rodata..LneLq }}} which is a sensible relocation to the unique section name `.rodata..LneLq`. Looks like something has gone wrong when we squashed the object files together to make `HSBase-4.11.0.0.o`. Indeed, if I take the command line to squash the object files and replace `ld.gold` with `ld`, then I get this relocation: {{{ 00000000004228cd R_X86_64_32S .rodata+0x000000000000e158 }}} which is much more sensible. And after doing that, the crashing program now works. So this suggests that the workaround should be to avoid using `ld.gold` for squashing objects together, I'll make a diff. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): After a bit of Googling around and chasing references I end up at this bug report: https://sourceware.org/bugzilla/show_bug.cgi?id=22266 submitted by none other than Ben Gamari :) Which leads to #14291, which is almost certainly the same bug as this one. So there are a little collection of tickets here: * #14675 (This bug) narrowed down to ld.gold generating a bogus HSfoo.o * #14291, also narrowed down to ld.gold producing a bogus HSfoo.o * #14328, tracking the upstream bug in ld.gold -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Phab:D4431 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonmar): * differential: => Phab:D4431 * blockedby: 14328 => -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: bgamari Type: bug | Status: new Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Phab:D4431 Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Hmmm, indeed comment:22 does suggest that this is another manifestation of #14291. Thanks for that, simonmar! I do wish we had a proper autoconf check for the `ld.gold` brokenness, but oh well. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs
on Ubuntu 16.04 or later
-------------------------------------+-------------------------------------
Reporter: RyanGlScott | Owner: bgamari
Type: bug | Status: new
Priority: highest | Milestone: 8.4.1
Component: GHC API | Version: 8.4.1-alpha1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: Runtime crash | Test Case:
Blocked By: | Blocking:
Related Tickets: #14603 | Differential Rev(s): Phab:D4431
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Simon Marlow

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: bgamari Type: bug | Status: merge Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Phab:D4431 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonmar): * status: new => merge -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:28 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs on Ubuntu 16.04 or later -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: bgamari Type: bug | Status: closed Priority: highest | Milestone: 8.4.1 Component: GHC API | Version: 8.4.1-alpha1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #14603 | Differential Rev(s): Phab:D4431 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: merge => closed * resolution: => fixed Comment: Merged with b4e32780a976193208eebbddf789eeb80351ac95. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14675#comment:29 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14675: GHC 8.4.1 regression: segfault when loading doctest on a module with ANNs
on Ubuntu 16.04 or later
-------------------------------------+-------------------------------------
Reporter: RyanGlScott | Owner: bgamari
Type: bug | Status: closed
Priority: highest | Milestone: 8.4.1
Component: GHC API | Version: 8.4.1-alpha1
Resolution: fixed | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: Runtime crash | Test Case:
Blocked By: | Blocking:
Related Tickets: #14603 | Differential Rev(s): Phab:D4431
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari
participants (1)
-
GHC