[GHC] #12537: ghc-openGL build Segmentation Fault for openSUSE ppc64le

#12537: ghc-openGL build Segmentation Fault for openSUSE ppc64le ---------------------------------+------------------------------------- Reporter: michelmno | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Keywords: | Operating System: Unknown/Multiple Architecture: powerpc64 | Type of failure: None/Unknown Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: ---------------------------------+------------------------------------- as detailed in attached log below the initial failure is a reported Segmentation Fault and if adding debuginfo rpms then reported failure is a ghc: panic for mkFastStringWith -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12537: ghc-openGL build Segmentation Fault for openSUSE ppc64le -------------------------------------+--------------------------------- Reporter: michelmno | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+--------------------------------- Changes (by michelmno): * Attachment "ghc_opengl_twppc64le3_build_failure.log.gz" added. ghc_opengl_twppc64le3_build_failure.log.gz -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12537: ghc-openGL build Segmentation Fault for openSUSE ppc64le ---------------------------------------+--------------------------------- Reporter: michelmno | Owner: trommler Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Compile-time crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | ---------------------------------------+--------------------------------- Changes (by trommler): * owner: => trommler * failure: None/Unknown => Compile-time crash Comment: This could be #12469. I am going to check. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

This could be #12469. I am going to check. I applied the patch for #12469 to ghc 8.0.1 and rebuilt all of Haskell LTS nightly and still about 40 to 50 packages fail with the segmentation fault reported here. Rebuilding succeeds most of the time. I am not ruling out #12469 yet as @rrnewton remarked in comment 12
#12537: ghc-openGL build Segmentation Fault for openSUSE ppc64le -------------------------------------+------------------------------------- Reporter: michelmno | Owner: trommler Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Incorrect result | Test Case: at runtime | Blocked By: | Blocking: Related Tickets: #12469 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by trommler): * failure: Compile-time crash => Incorrect result at runtime * version: 7.10.3 => 8.0.1 * related: => #12469 Comment: Replying to [comment:1 trommler]: that a write (store-store) barrier might be missing in array writes too. Note: openSUSE's ghc 7.10.3 is patched with the native code generator and the patch in SMP.h for atomic operations. I am setting the version field to 8.0.1 to avoid confusion. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12537: ghc-openGL build Segmentation Fault for openSUSE ppc64le -------------------------------------+------------------------------------- Reporter: michelmno | Owner: trommler Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Incorrect result | Test Case: at runtime | Blocked By: | Blocking: Related Tickets: #12469 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by trommler): The panic in `mkFastStringWith` does not depend on debuginfo rpms being installed. I have also seen it happen on the build service but only rarely. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12537: ghc-openGL build Segmentation Fault for openSUSE ppc64le -------------------------------------+------------------------------------- Reporter: michelmno | Owner: trommler Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Incorrect result | Test Case: at runtime | Blocked By: 12469 | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by trommler): * blockedby: => 12469 * related: #12469 => -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12537: Parallel cabal builds Segmentation Fault on PowerPC 64-bit -------------------------------------+------------------------------------- Reporter: michelmno | Owner: trommler Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Incorrect result | Test Case: at runtime | Blocked By: 12469 | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by trommler): I could reproduce the segmentation fault on a powerpc64 big-endian Linux machine and I can also reproduce it with other packages. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12537: Parallel cabal builds Segmentation Fault on PowerPC 64-bit -------------------------------------+------------------------------------- Reporter: michelmno | Owner: trommler Type: bug | Status: infoneeded Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Incorrect result | Test Case: at runtime | Blocked By: 12469 | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by trommler): * status: new => infoneeded Comment: In #12469 I wondered if qemu is doing something odd. @michelmno did you run your tests on bare hardware? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12537: Parallel cabal builds Segmentation Fault on PowerPC 64-bit -------------------------------------+------------------------------------- Reporter: michelmno | Owner: trommler Type: bug | Status: infoneeded Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Incorrect result | Test Case: at runtime | Blocked By: 12469 | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by michelmno): Replying to [comment:6 trommler]:
In #12469 I wondered if qemu is doing something odd. @michelmno did you run your tests on bare hardware?
I tested on a ppc64le bare-metal machine (without qemu) and I still have core of ghc. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12537: Parallel cabal builds Segmentation Fault on PowerPC 64-bit -------------------------------------+------------------------------------- Reporter: michelmno | Owner: trommler Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Incorrect result | Test Case: at runtime | Blocked By: 12469 | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by trommler): * status: infoneeded => new Comment: Replying to [comment:7 michelmno]:
Replying to [comment:6 trommler]:
In #12469 I wondered if qemu is doing something odd. @michelmno did you run your tests on bare hardware?
I tested on a ppc64le bare-metal machine (without qemu) and I still have core of ghc. Thank you for the info!
-- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12537: Parallel cabal builds Segmentation Fault on PowerPC 64-bit -------------------------------------+------------------------------------- Reporter: michelmno | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Incorrect result | Test Case: at runtime | Blocked By: 12469 | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by trommler): * owner: trommler => Comment: I have no idea what might be causing this issue and other random panics I see on openSUSE Build Service. For now I am disowning the ticket. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12537: Parallel cabal builds Segmentation Fault on PowerPC 64-bit -------------------------------------+------------------------------------- Reporter: michelmno | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Incorrect result | Test Case: at runtime | Blocked By: 12469 | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by michelmno): The trials I made in [comment:7 comment 7] was with ghc 8.0.1 with following patches (as per spec file in OBS https://build.opensuse.org/package/view_file/devel:languages:haskell/ghc/ghc... {{{ # PATCH-FIX-UPSTREAM D2495.patch peter.trommler@ohm-hochschule.de -- Add missing memory barrier on mutable variables. See https://ghc.haskell.org/trac/ghc/ticket/12469 for details. Backport of upstream fix for ghc 8.0.2. Patch27: D2495.patch # PATCH-FIX_UPSTREAM 0001-StgCmmPrim-Add-missing-write-barrier.patch peter.trommler@ohm-hochschule.de -- Add missing write barrier on mutable arrays. Patch28: 0001-StgCmmPrim-Add-missing-write-barrier.patch # PATCH-FIX_UPSTREAM ghc-no-madv-free.patch psimons@suse.com -- Fix "unable to decommit memory: Invalid argument" errors. See https://ghc.haskell.org/trac/ghc/ticket/12495 for details. Patch29: ghc-no-madv-free.patch # PATCH-FIX-UPSTREAM 0001-PPC-CodeGen-fix-lwa-instruction-generation.patch peter.trommler@ohm-hochschule.de -- Fix PPC codegen: Fixes ghc- zeromq4-haskell build on 64-bit PowerPCs Patch30: 0001-PPC-CodeGen-fix-lwa-instruction-generation.patch }}} The core dumps as reported by systemd-coredump are not always at same address (as per journalctl output) {{{ Jan 17 15:27:42 abanc kernel: ghc_worker[12624]: unhandled signal 11 at 38425a003c4c02f4 nip 38425a003c4c02f4 lr 00003fff8da98774 code 30001 Jan 17 15:28:01 abanc systemd-coredump[12677]: Process 12485 (Setup) of user 1000 dumped core. Jan 17 15:31:25 abanc kernel: ghc_worker[12822]: unhandled signal 11 at 0000000000000000 nip 00003fff8430b4f4 lr 00003fff84308774 code 30001 Jan 17 15:31:37 abanc systemd-coredump[12828]: Process 12793 (Setup) of user 1000 dumped core. Jan 17 15:32:34 abanc systemd-coredump[12827]: Process 12795 (ghc) of user 1000 dumped core. Jan 17 15:34:18 abanc systemd-coredump[12676]: Process 12487 (ghc) of user 1000 dumped core. Jan 17 15:42:23 abanc kernel: ghc_worker[14939]: unhandled signal 11 at 0000000000000000 nip 00003fff8122b4f4 lr 00003fff81228774 code 30001 Jan 17 15:42:41 abanc systemd-coredump[15017]: Process 14820 (Setup) of user 1000 dumped core. }}} Trying to analyse one of the core file with gdb refer to SIGSEGV in stg_ap_0_fast as per extract below. I do not know how to continue investigation from there. {{{ + echo 'r -B/usr/lib64/ghc-8.0.1 ' + exec gdb -c /var/lib/systemd/coredump/core.ghc.1000.b44d99385b4c45eb840722189ffdf026.14822.1484667743000000 -x /tmp/gdbparms /usr/lib64/ghc-8.0.1/bin/ghc GNU gdb (GDB; openSUSE Tumbleweed) 7.11.1 ... Reading symbols from /usr/lib64/ghc-8.0.1/bin/ghc...Reading symbols from /usr/lib/debug/usr/lib64/ghc-8.0.1/bin/ghc.debug...done. done. [New LWP 14939] [New LWP 14861] [New LWP 14951] ... [New LWP 14932] [New LWP 14927] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/lib64/ghc-8.0.1/bin/ghc -B/usr/lib64/ghc-8.0.1 --make -fbuilding-cabal-pac'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00003fff8122b4f4 in stg_ap_0_fast () from /usr/lib64/ghc-8.0.1/bin/../rts/libHSrts_thr-ghc8.0.1.so [Current thread is 1 (Thread 0x3efdf27ff1a0 (LWP 14939))] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New Thread 0x3fffad6df1a0 (LWP 19906)] [New Thread 0x3fffacedf1a0 (LWP 19907)] [New Thread 0x3fffa7fff1a0 (LWP 19908)] ghc: no input files Usage: For basic information, try the `--help' option. [Thread 0x3fffa7fff1a0 (LWP 19908) exited] [Thread 0x3fffacedf1a0 (LWP 19907) exited] [Thread 0x3fffad6df1a0 (LWP 19906) exited] }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12537: Parallel cabal builds Segmentation Fault on PowerPC 64-bit -------------------------------------+------------------------------------- Reporter: michelmno | Owner: trommler Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Incorrect result | Test Case: at runtime | Blocked By: 12469 | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by trommler): * owner: (none) => trommler Comment: I found an issue in the implementation of atomic read and atomic write operations in ghc-prim. I am working on a fix for powerpc64 and powerpc64le. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

I found an issue in the implementation of atomic read and atomic write operations in ghc-prim. I am working on a fix for powerpc64 and
#12537: Parallel cabal builds Segmentation Fault on PowerPC 64-bit -------------------------------------+------------------------------------- Reporter: michelmno | Owner: trommler Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Incorrect result | Test Case: at runtime | Blocked By: 12469 | Blocking: Related Tickets: | Differential Rev(s): Phab:D3984 Wiki Page: | -------------------------------------+------------------------------------- Changes (by trommler): * status: new => patch * differential: => Phab:D3984 Comment: Replying to [comment:11 trommler]: powerpc64le. Phab:3984 improves the situation a lot on an old PowerMac the segfault occurs only on every other run where before this patch I would see a segfault on almost all build attempts. The correctness issue I found in `libraries/ghc-prim/cbits/atomic.c` affects all platforms that are using the fallback functions `hs_atomicread*` and `hs_atomicwrite`. I will create a separate ticket for that. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Replying to [comment:11 trommler]:
I found an issue in the implementation of atomic read and atomic write operations in ghc-prim. I am working on a fix for powerpc64 and
#12537: Parallel cabal builds Segmentation Fault on PowerPC 64-bit -------------------------------------+------------------------------------- Reporter: michelmno | Owner: trommler Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Incorrect result | Test Case: at runtime | Blocked By: 12469 | Blocking: Related Tickets: | Differential Rev(s): Phab:D3984 Wiki Page: | -------------------------------------+------------------------------------- Comment (by michelmno): Replying to [comment:12 trommler]: powerpc64le.
Phab:3984 improves the situation a lot on an old PowerMac. The segfault occurs only on every other run where before this patch I would see a segfault on almost all build attempts.
The correctness issue I found in `libraries/ghc-prim/cbits/atomic.c` affects all platforms that are using the fallback functions `hs_atomicread*` and `hs_atomicwrite*`. I will create a separate ticket for that.
What is the referenced Phab:3934 ? it is pointing to https://phabricator.haskell.org/3984 that is a "404 not found" page What is the reference to the separate ticket supposed to be created ? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12537: Parallel cabal builds Segmentation Fault on PowerPC 64-bit -------------------------------------+------------------------------------- Reporter: michelmno | Owner: trommler Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Incorrect result | Test Case: at runtime | Blocked By: 12469 | Blocking: Related Tickets: #14244 | Differential Rev(s): Phab:D3984 Wiki Page: | -------------------------------------+------------------------------------- Changes (by trommler): * related: => #14244 Comment: Replying to [comment:13 michelmno]:
ref Phab:3934 is invalid it is probably Phab:D3934 I assume
I fixed the link, sorry.
What is the reference to the separate ticket supposed to be created ?
That is #14244. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12537: Parallel cabal builds Segmentation Fault on PowerPC 64-bit
-------------------------------------+-------------------------------------
Reporter: michelmno | Owner: trommler
Type: bug | Status: patch
Priority: normal | Milestone:
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture: powerpc64
Type of failure: Incorrect result | Test Case:
at runtime |
Blocked By: 12469 | Blocking:
Related Tickets: #14244 | Differential Rev(s): Phab:D3984
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#12537: Parallel cabal builds Segmentation Fault on PowerPC 64-bit -------------------------------------+------------------------------------- Reporter: michelmno | Owner: trommler Type: bug | Status: closed Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: powerpc64 Type of failure: Incorrect result | Test Case: at runtime | Blocked By: 12469 | Blocking: Related Tickets: #14244 | Differential Rev(s): Phab:D3984 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: patch => closed * resolution: => fixed * milestone: => 8.4.1 Comment: Hopefully that will fix it. hvr will test when testing 8.4.1. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12537#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC