[GHC] #14346: 8.2.1 regression: heap corruption after safe foreign calls

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: Component: Runtime | Version: 8.2.1 System | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime crash Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Test case: (compile with ghc 8.2.1 and -threaded option) {{{#!haskell module Main where import Control.Concurrent import Control.Monad import Data.Word import Foreign.Marshal.Alloc import Foreign.Ptr import Foreign.Storable foreign import ccall safe "test" c_test :: Ptr Word32 -> IO () main :: IO () main = do replicateM_ 1000 $ threadDelay 1000 _ <- forkIO $ forever $ threadDelay 100 allocaBytes 4 $ \p -> forever $ do c_test p x <- peek p unless (x == 0xDEADBEEF) $ putStrLn "value mismatch" }}} {{{#!c void test(unsigned int *buf) { *buf = 0xDEADBEEF; } }}} On my machine, it detects a few value mismatches before crashing with sigsegv. {{{ $ time ./.stack-work/install/x86_64-linux- nopie/nightly-2017-10-10/8.2.1/bin/bug value mismatch value mismatch value mismatch value mismatch zsh: segmentation fault (core dumped) ./.stack-work/install/x86_64-linux- nopie/nightly-2017-10-10/8.2.1/bin/bug ./.stack-work/install/x86_64-linux-nopie/nightly-2017-10-10/8.2.1/bin/bug 2.11s user 0.25s system 66% cpu 3.543 total }}} I believe this is what is causing crashes in xmobar. See discussion: https://github.com/jaor/xmobar/issues/310. Note that the crash in xmobar still happens without -threaded option, while this example only breaks when compiled with -threaded. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by andrewchen): * Attachment "ghc_output" added. compiler output with -v -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by RyanGlScott): * status: new => infoneeded Comment: For whatever reason, I'm not able to reproduce this on either my Ubuntu 14.04 or 17.04 machines with GHC 8.2.1. I'm doing this: {{{ $ ghc -fforce-recomp -threaded bug.c Bug.hs [1 of 1] Compiling Main ( Bug.hs, Bug.o ) Linking Bug ... $ ./Bug }}} It then proceeds to run forever (AFAICT) without hitting any `value mistmatch`es or segfaults. Some questions: 1. What operating system are you using? 2. How can I reproduce this issue //with just GHC//? Please, no instructions involving fancy build tools like `stack`, since if this really is a GHC bug, one should be able to trigger the issue with just GHC. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Thanks for your report andrewchan; Unfortunately, as with RyanGlScott, I am unable to reproduce this with `+RTS -N4`, `+RTS -N1`, or under any of GHC's optimization levels on Debian 9 running on amd64. Having a standalone testcase, free of build tools, would be quite helpful. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by andrewchen): I am running arch linux x64 on a i5-4200U laptop. I'm able to reproduce with just the system ghc: {{{ ghc Main.hs test.c -threaded -O1 -fforce-recomp [1 of 1] Compiling Main ( Main.hs, Main.o ) Linking Main ... }}} For me the program fails within seconds: {{{ $ time ./Main value mismatch value mismatch value mismatch value mismatch zsh: segmentation fault (core dumped) ./Main ./Main 2.19s user 0.20s system 67% cpu 3.553 total }}} I'm also able to reproduce the issue in a fedora virtual machine on the same physical machine using ghc 8.2.1 binaries downloaded from haskell.org. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rezb1t): I can reproduce this same issue on my machine, I am using: NixOS x86_64 Unstable Branch (as of 10-12) GHC 8.2.1 Binutils 2.28.1 GCC 6.4.0 I noticed the bug does not occur and the program runs infinitely if I simply compile with 'ghc Main.hs test.c -threaded -o Bug', however, if Optimization level 1 or 2 are enabled, the bug happens very quickly after running the binary. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by andrewchen): Also, I forgot to add, the bug does not occur when compiled with debug symbols (-g). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Interesting; I can also reproduce this in my Nix unstable VM. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by RyanGlScott): Somewhat to my surprise, this regression was introduced in 8d5cf8bf584fd4849917c29d82dcf46ee75dd035 (`Join points`). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Well, comment:7 certainly explains why `-g` avoids the crash: in 8.2 source note ticks essentially prevented GHC from marking anything as a join point. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls
-------------------------------------+-------------------------------------
Reporter: andrewchen | Owner: (none)
Type: bug | Status: infoneeded
Priority: highest | Milestone:
Component: Runtime System | Version: 8.2.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: Runtime crash | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by bgamari):
I could have sworn I left a comment last night but it seems I am mistaken.
Here is what I discovered while looking into this so far:
The test is indeed rather environment sensitive. Moreover, as it doesn't
occur under `rr` I strongly suspect it's a race of some sort. When
compiled with `-debug` the eventual segmentation fault always seems to
occur in `stg_putMVarzh`. Specifically here,
{{{
Dump of assembler code for function stg_putMVarzh:
0x00000000004ab1b0 <+0>: cmpl $0x1,0x4f4800
0x00000000004ab1b8 <+8>: je 0x4ab35e
print/a $rbx $1 = 0x42000b8400 print/a $rdx $2 = 0x42deadbeef }}} Yikes!
This sounds to me like we reentered STG while forgetting to do some bit of cleanup from the foreign call. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls
-------------------------------------+-------------------------------------
Reporter: andrewchen | Owner: (none)
Type: bug | Status: infoneeded
Priority: highest | Milestone:
Component: Runtime System | Version: 8.2.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: Runtime crash | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by andrewchen):
I managed to do a `rr` capture with `--chaos` mode.
Here's the part in main where it does the comparison:
{{{
0x404581

#14346: 8.2.1 regression: heap corruption after safe foreign calls
-------------------------------------+-------------------------------------
Reporter: andrewchen | Owner: (none)
Type: bug | Status: infoneeded
Priority: highest | Milestone:
Component: Runtime System | Version: 8.2.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: Runtime crash | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by bgamari):
Indeed the `--chaos` tip is quite helpful. Thanks!
So it appears that the crazy TSO is loaded in `stg_putMVar#` on line 1737:
{{{#!c
...
// There are readMVar/takeMVar(s) waiting: wake up the first one
tso = StgMVarTSOQueue_tso(q); // <---
here
StgMVar_head(mvar) = StgMVarTSOQueue_link(q);
if (StgMVar_head(mvar) == stg_END_TSO_QUEUE_closure) {
StgMVar_tail(mvar) = stg_END_TSO_QUEUE_closure;
}
...
}}}
Here `q` is `0x42000b7530` which is a fairly reasonable-looking
`MVAR_TSO_QUEUE`, except with a completely wild `tso` field,
{{{
0x42000b7530: 0x4acd28

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): On looking at this with fresh eyes, it seems that unfortunately my analysis from comment:11 is flawed; the `movq %rbx,$rdi` is completely correct. We spill to the callee-saved `%rbx register before `suspendThread` and then more the value from `%rbx` to `%rdi`, which is where we expect the first argument to reside. The second spill is simply preserving `_u4RH`, which is still alive after the call to `test`. Back to the drawing board. I think now I'll focus on catching the issue earlier in execution; namely, when we first get the `value mismatch` message. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hsyl20): You don't need the FFI to trigger the bug. You can replace: {{{#!haskell foreign import ccall safe "test" c_test :: Ptr Word32 -> IO () }}} with: {{{#!haskell {-# NOINLINE c_test #-} c_test :: Ptr Word32 -> IO () c_test ptr = poke ptr 0xDEADBEEF }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hsyl20): Smaller reproducer: {{{#!haskell -- ghc -Wall -fforce-recomp -O1 Test.hs -threaded module Main where import Control.Concurrent import Control.Monad import Data.Word import Foreign.Marshal.Alloc import Foreign.Storable import Numeric main :: IO () main = do replicateM_ 100 $ threadDelay 100 allocaBytes 4 $ \p -> do forever $ do poke p (0xDEADBEEF :: Word32) threadDelay 10 x <- peek p unless (x == 0xDEADBEEF) $ putStrLn (showHex x "") }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by andrewchen): Really interestingly replacing `forever` with `replicateM_ 1000000000` doesn't trigger the bug anymore. A bit of speculation: compiler sees that the `touch#` at the end of `allocaBytes` is unreachable due to `forever`, and so ignores it and allows the allocated are to be GC'ed. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by alexbiehl): I had a look at the cmm code for comment [comment:14] and I have some questions: {{{ c4CO: // global _s4Bx::I64 = R1 + 16; <---- assign _s4Bx the pointer to the first byte of the bytearray (16 byte = infotable ptr + length field ) goto c4CW; c4CW: // global I32[_s4Bx::I64] = 3735928559 :: W32; (_s4BE::I64) = call "ccall" arg hints: [] result hints: [`signed'] rtsSupportsBoundThreads(); if (_s4BE::I64 != 0) goto c4Dz; else goto c4DE; c4Dz: // global I64[Sp - 8] = block_c4Dx_info; R2 = Main.main2_closure+1; I64[Sp] = _s4Bx::I64; <------ is it ok to store an address which clearly points into heap allocated memory but doesn't point to an info table? Sp = Sp - 8; call GHC.Conc.Windows.threadDelay1_info(R2) returns to c4Dx, args: 8, res: 8, upd: 8; c4Dx: // global _s4Bx::I64 = I64[Sp + 8]; goto c4D2; c4DE: // global I64[Sp - 8] = block_c4DD_info; R1 = 10; <------- overwrite R1, R1 was our *only* reference to the bytearray closure. I64[Sp] = _s4Bx::I64; Sp = Sp - 8; call stg_delay#(R1) returns to c4DD, args: 8, res: 8, upd: 8; c4DD: // global _s4Bx::I64 = I64[Sp + 8]; goto c4D2; c4D2: // global }}} - The only reference to the ByteArray closure is in R1 - _s4Bx points to the first byte in the byte array - In block c4DE R1 is overwritten. - The rtsSupportsBoundThreads is a ccall, don't we have to save R1 over these calls? Maybe the garbage collector assumes the ByteArray is dead and collects it too early? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Very good insights, alexbiehl and andrewchen. Indeed it looks indeed the GC is (correctly, given the code) concluding that the array is unreachable. Looking at the `-dverbose-core2core` output one sees that the `touch#` call is dropped during one of the simplifier passes (`SimplMode {Phase = 0 [post-call-arity], inline, rules, eta-expand, case-of-case}`). That is certainly the bug. To answer, a few of your questions:
is it ok to store an address which clearly points into heap allocated memory but doesn't point to an info table?
In the above case, the answer is probably yes. This pointer is saved as a field of a stack frame (namely a return frame for `block_c4Dx_info`). The info table for this frame likely declares this field as a non-pointer. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): So However, in general I wonder whether `touch#` is more unsafe than strictly necessary. It seems to me that for a tad of stack allocation you can get a much safer way to keep values alive. The trick is to introduce a primop, {{{ with# :: a -> r -> r }}} When `with# a cont` is entered, the entry code will, 1. Push an `StgWithFrame`, a new sort of return frame which carries a reference to `a`, onto the stack 2. Enters `cont` When `cont` returns, it will enter the entry code for `StgWithFrame`, which will simply pop itself and return. This way we don't need to worry about Core simplifications dropping important `touch#`s. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * cc: simonmar (added) Comment:
It looks like to avoid this we will either need to teach the simplifier not to throw away otherwise dead continuations which contain some "important" primops
Can you give an example to show what it is throwing away, and why that's bad? I don't get it yet. I have even forgotten why `touch#` exists. Copying Simon Marlow. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): Interesting, can someone boil down the transformation that dropped the `touch#`? Simon: `touch#` is keeping the `ByteArray#` alive until after the `action`, in `allocaBytes` (see comment:18). The action itself doesn't keep the array alive, because it is working with the raw pointer, not the `ByteArray#`. This is how we allocate temporary memory for marshalling data between Haskell and C, because it's a lot faster to allocate memory on the Haskell heap than to use `malloc()` and `free()`. I imagine the simplifier has proven that `action` never returns and then dropped the `case` with the continuation containing the `touch#`. That seems like a reasonable thing to do. I like @bgamari's alternative suggestion of `with#`, although we probably want it to be {{{ with# :: a -> (State# s -> (# State# s, b #)) -> State# s -> (# State# s, b #) }}} otherwise the second argument must be a thunk (yuck). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

I imagine the simplifier has proven that action never returns and then dropped the case with the continuation containing the `touch#`. That seems
#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): like a reasonable thing to do. Correct. I believe it is `Simplify.rebuildCall` that is responsible for this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: patch Priority: highest | Milestone: 8.2.2 Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D4110 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: infoneeded => patch * differential: => Phab:D4110 * milestone: => 8.2.2 Comment: See Phab:D4110 for an implementation. Still need to add a test. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: patch Priority: highest | Milestone: 8.2.2 Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D4110 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): ...for an implementation of `with#`, that is. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: patch Priority: highest | Milestone: 8.2.2 Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D4110 Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Indeed Phab:D4110 appears to fix the testcase from comment:14. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: patch Priority: highest | Milestone: 8.2.2 Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * differential: Phab:D4110 => Comment: Let's move discussion of `with#` to new ticket just for that purpose: #14375. This ticket remains to track the bug reported in the Description. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: patch Priority: highest | Milestone: 8.4.1 Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: 8.2.2 => 8.4.1 Comment: Alright, I have marked `allocaBytes` and `allocaBytesAligned` as `NOINLINE` for 8.2.2. A more principled solution, in the form of #14375, coming in 8.4.1. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: bgamari Type: bug | Status: patch Priority: highest | Milestone: 8.4.1 Component: Runtime System | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * owner: (none) => bgamari -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:27 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: bgamari Type: bug | Status: closed Priority: highest | Milestone: 8.4.1 Component: Runtime System | Version: 8.2.1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: patch => closed * resolution: => fixed * blockedby: 14375 => Comment: I'm going to close this since the serious correctness issue is resolved for now. A more efficient solution will come in 8.4 (#14375). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:29 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.4.4 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D5020 Wiki Page: | -------------------------------------+------------------------------------- Changes (by hsyl20): * status: closed => new * differential: => Phab:D5020 * component: Runtime System => Compiler * version: 8.2.1 => 8.4.3 * milestone: 8.4.1 => 8.4.4 * owner: bgamari => (none) * resolution: fixed => Comment: Sadly 404bf05ed3193e918875cd2f6c95ae0da5989be2 never made into HEAD nor GHC 8.4 branch and #14375 hasn't been completed before GHC 8.4 release. Hence HEAD and 8.4.* suffer from this bug (see #15260). I've made a new diff reintroducing the workaround and adding a regression test: Phab:D5020. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:30 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #15260 | Differential Rev(s): Phab:D5020 Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * version: 8.4.3 => 8.5 * related: => #15260 * milestone: 8.4.4 => 8.6.1 Comment: This issue seems quite serious to me and it's actually encountered in the wild (see the related ticket). bgamari, can we milestone this for the next release? comment:29 says "correctness issue is resolved for now" but I don't see how. As far as I can see no commits were done for this ticket. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:31 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

This issue seems quite serious to me and it's actually encountered in
#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #15260 | Differential Rev(s): Phab:D5020 Wiki Page: | -------------------------------------+------------------------------------- Comment (by hsyl20): Replying to [comment:31 osa1]: the wild (see the related ticket). bgamari, can we milestone this for the next release? Could we also have a 8.4.4 release?
comment:29 says "correctness issue is resolved for now" but I don't see how. As far as I can see no commits were done for this ticket.
It was fixed in 8.2.2: https://git.haskell.org/ghc.git/commitdiff/404bf05ed3193e918875cd2f6c95ae0da... -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:32 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #15260 | Differential Rev(s): Phab:D5020 Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1):
It was fixed in 8.2.2: https://git.haskell.org/ghc.git/commitdiff/404bf05ed3193e918875cd2f6c95ae0da...
Ah, I guess the patch was only merged to 8.2.2 branch (not to master). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:33 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #15260 | Differential Rev(s): Phab:D5020 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
Ah, I guess the patch was only merged to 8.2.2 branch (not to master).
That's terrible. And very unusual; usually we put things on master first and then merge to a branch. However, if I understand aright: * The patch in 8.2.2 (comment:33 above) is very much a hack. * The real fix is in #14375 * There is an active patch for #14375 So we should make sure that the patch for #14375 does fix this ticket, and #15260 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:34 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #15260 | Differential Rev(s): Phab:D5020 Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Very sorry about that hsyl20. This was a case where we had temporary fix which we desperately wanted to improve upon and therefore only merged it to `ghc-8.2`. However, we then weren't able to finish the real fix (`with#`, #14375) before 8.4 and the hack was never ported to `master`. I've cherry-picked the fix onto `ghc-8.4`, `ghc-8.6`, and `master`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:35 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #15260 | Differential Rev(s): Phab:D5020 Wiki Page: | -------------------------------------+------------------------------------- Comment (by svenpanne): So this looks like a hint that there will be a GHC 8.4.4. :-) Can we bump the haddock submodule in that branch, too, to get a fix for https://github.com/haskell/haddock/issues/837? That issue is quite annoying, breaking lots of packages when built with stack. Perhaps there is already a ticket for this, but I couldn't find it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:36 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls
-------------------------------------+-------------------------------------
Reporter: andrewchen | Owner: (none)
Type: bug | Status: new
Priority: highest | Milestone: 8.6.1
Component: Compiler | Version: 8.5
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: Runtime crash | Test Case:
Blocked By: | Blocking:
Related Tickets: #15260 | Differential Rev(s): Phab:D5020
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #15260 | Differential Rev(s): Phab:D5020 Wiki Page: | -------------------------------------+------------------------------------- Comment (by hsyl20): Thanks Ben. Could you also merge the test case from Phab:D5020 so that it doesn't get lost? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:38 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls
-------------------------------------+-------------------------------------
Reporter: andrewchen | Owner: (none)
Type: bug | Status: new
Priority: highest | Milestone: 8.6.1
Component: Compiler | Version: 8.5
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: Runtime crash | Test Case:
Blocked By: | Blocking:
Related Tickets: #15260 | Differential Rev(s): Phab:D5020
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #15260 | Differential Rev(s): Phab:D5020 Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari):
Thanks Ben. Could you also merge the test case from Phab:D5020 so that it doesn't get lost?
Sure. Done in f8e5da92c0160a675e1666a5d6ed6a8ffcae193c. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:40 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14346: 8.2.1 regression: heap corruption after safe foreign calls -------------------------------------+------------------------------------- Reporter: andrewchen | Owner: (none) Type: bug | Status: closed Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: #15260 | Differential Rev(s): Phab:D5020 Wiki Page: | -------------------------------------+------------------------------------- Changes (by hsyl20): * status: new => closed * resolution: => fixed Comment: Thanks Ben! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14346#comment:41 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC