Anyone else failing to validate on 'linker_unload'?

That test builds an executable named 'linker_unload' which segfaults for me. Valgrind says this: ==42800== Invalid read of size 8 ==42800== at 0x66945F: checkUnload (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x657F7A: GarbageCollect (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x651790: scheduleDoGC (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x6518B4: performGC_ (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x403BB1: main (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== Address 0x5bfdd20 is 80 bytes inside a block of size 120 free'd ==42800== at 0x4C273F0: free (vg_replace_malloc.c:446) ==42800== by 0x66945E: checkUnload (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x657F7A: GarbageCollect (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x651790: scheduleDoGC (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x6518B4: performGC_ (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x403BB1: main (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) This went the same across a couple different independent checkouts. -Ryan

Yes, this one is failing for me too. Probably related to the recent object unload patch for http://ghc.haskell.org/trac/ghc/ticket/8039 Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700 2013:
That test builds an executable named 'linker_unload' which segfaults for me. Valgrind says this:
==42800== Invalid read of size 8 ==42800== at 0x66945F: checkUnload (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x657F7A: GarbageCollect (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x651790: scheduleDoGC (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x6518B4: performGC_ (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x403BB1: main (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== Address 0x5bfdd20 is 80 bytes inside a block of size 120 free'd ==42800== at 0x4C273F0: free (vg_replace_malloc.c:446) ==42800== by 0x66945E: checkUnload (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x657F7A: GarbageCollect (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x651790: scheduleDoGC (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x6518B4: performGC_ (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x403BB1: main (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
This went the same across a couple different independent checkouts.
-Ryan

However, as far as I can tell, it is not 100% reproduceable. In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63, this test does not fail. Edward Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700 2013:
Yes, this one is failing for me too. Probably related to the recent object unload patch for http://ghc.haskell.org/trac/ghc/ticket/8039
Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700 2013:
That test builds an executable named 'linker_unload' which segfaults for me. Valgrind says this:
==42800== Invalid read of size 8 ==42800== at 0x66945F: checkUnload (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x657F7A: GarbageCollect (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x651790: scheduleDoGC (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x6518B4: performGC_ (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x403BB1: main (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== Address 0x5bfdd20 is 80 bytes inside a block of size 120 free'd ==42800== at 0x4C273F0: free (vg_replace_malloc.c:446) ==42800== by 0x66945E: checkUnload (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x657F7A: GarbageCollect (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x651790: scheduleDoGC (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x6518B4: performGC_ (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x403BB1: main (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
This went the same across a couple different independent checkouts.
-Ryan

I have also not seen this test fail on amd64/Linux since Simon
committed it. From the valgrind output, it looks like your machine is
32bit, correct Ryan? Edward told me yesterday on IRC he saw this fail
on 64bit Linux, so I'm a little confused.
Can you please try this?
$ cd testsuite/tests/rts
$ make TEST="linker_unload" EXTRA_HC_OPTS="-debug"
$ valgrind ./linker_unload
This will link you with a debug copy of the RTS, so Valgrind/GDB can
relate errors back to the relevant source code. Perhaps this will help
shed light on your problem.
On Sun, Sep 1, 2013 at 9:39 PM, Edward Z. Yang
However, as far as I can tell, it is not 100% reproduceable. In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63, this test does not fail.
Edward
Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700 2013:
Yes, this one is failing for me too. Probably related to the recent object unload patch for http://ghc.haskell.org/trac/ghc/ticket/8039
Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700 2013:
That test builds an executable named 'linker_unload' which segfaults for me. Valgrind says this:
==42800== Invalid read of size 8 ==42800== at 0x66945F: checkUnload (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x657F7A: GarbageCollect (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x651790: scheduleDoGC (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x6518B4: performGC_ (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x403BB1: main (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== Address 0x5bfdd20 is 80 bytes inside a block of size 120 free'd ==42800== at 0x4C273F0: free (vg_replace_malloc.c:446) ==42800== by 0x66945E: checkUnload (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x657F7A: GarbageCollect (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x651790: scheduleDoGC (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x6518B4: performGC_ (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x403BB1: main (in /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
This went the same across a couple different independent checkouts.
-Ryan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin - PGP: 4096R/0x91384671

Hi Austin, Should have said -- this is 64-bit RHEL 6 (my academic departments standardized configuration). $ uname -a Linux 2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux Weirdly it seems to have a different behavior when run by "make" and by hand. When I run the make command you provided it segfaults with error code 2: *cd . && $MAKE -s --no-print-directory linker_unload
linker_unload.run.stdout 2>linker_unload.run.stderr* *Wrong exit code (expected 0 , actual 2 )* *Stdout:* *Stderr:* *make[1]: *** [linker_unload] Segmentation fault (core dumped)* **** unexpected failure for linker_unload(normal)* *Unexpected results from:* *TEST="linker_unload"*
But then when I run it by hand with "./linker_unload" or "valgrind
./linker_unload" I get an unknown symbol error with exit code 1:
*==70613==*
*linker_unload: Test.o: unknown symbol `base_GHCziNum_zdfNumInt_closure'*
*linker_unload: resolveObjs failed*
*==70613==*
*==70613== HEAP SUMMARY:*
-Ryan
On Sun, Sep 1, 2013 at 10:46 PM, Austin Seipp
I have also not seen this test fail on amd64/Linux since Simon committed it. From the valgrind output, it looks like your machine is 32bit, correct Ryan? Edward told me yesterday on IRC he saw this fail on 64bit Linux, so I'm a little confused.
Can you please try this?
$ cd testsuite/tests/rts $ make TEST="linker_unload" EXTRA_HC_OPTS="-debug" $ valgrind ./linker_unload
This will link you with a debug copy of the RTS, so Valgrind/GDB can relate errors back to the relevant source code. Perhaps this will help shed light on your problem.
On Sun, Sep 1, 2013 at 9:39 PM, Edward Z. Yang
wrote: However, as far as I can tell, it is not 100% reproduceable. In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63, this test does not fail.
Edward
Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700 2013:
Yes, this one is failing for me too. Probably related to the recent object unload patch for http://ghc.haskell.org/trac/ghc/ticket/8039
Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700 2013:
That test builds an executable named 'linker_unload' which segfaults for me. Valgrind says this:
==42800== Invalid read of size 8 ==42800== at 0x66945F: checkUnload (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x657F7A: GarbageCollect (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x651790: scheduleDoGC (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x6518B4: performGC_ (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x403BB1: main (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== Address 0x5bfdd20 is 80 bytes inside a block of size
120
free'd ==42800== at 0x4C273F0: free (vg_replace_malloc.c:446) ==42800== by 0x66945E: checkUnload (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x657F7A: GarbageCollect (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x651790: scheduleDoGC (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x6518B4: performGC_ (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x403BB1: main (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
This went the same across a couple different independent checkouts.
-Ryan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin - PGP: 4096R/0x91384671
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Excerpts from Ryan Newton's message of Sun Sep 01 19:54:34 -0700 2013:
But then when I run it by hand with "./linker_unload" or "valgrind ./linker_unload" I get an unknown symbol error with exit code 1:
Well, that's because that's not what make is running: ./linker_unload $(BASE) $(GHC_PRIM) $(INTEGER_GMP) Try removing the -s flag. Edward

Oops, should have said this: if you checkout the Makefile for
testsuite/tests/rts - at the very bottom - you'll see the
linker_unload target. When run, the executable needs some arguments so
it knows what to try and load:
---
./linker_unload $(BASE) $(GHC_PRIM) $(INTEGER_GMP)
---
So you also need to provide the right arguments. Sorry about that!
On Sun, Sep 1, 2013 at 9:54 PM, Ryan Newton
Hi Austin,
Should have said -- this is 64-bit RHEL 6 (my academic departments standardized configuration).
$ uname -a Linux 2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
Weirdly it seems to have a different behavior when run by "make" and by hand. When I run the make command you provided it segfaults with error code 2:
cd . && $MAKE -s --no-print-directory linker_unload
linker_unload.run.stdout 2>linker_unload.run.stderr Wrong exit code (expected 0 , actual 2 ) Stdout: Stderr: make[1]: *** [linker_unload] Segmentation fault (core dumped) *** unexpected failure for linker_unload(normal) Unexpected results from: TEST="linker_unload"
But then when I run it by hand with "./linker_unload" or "valgrind ./linker_unload" I get an unknown symbol error with exit code 1:
==70613== linker_unload: Test.o: unknown symbol `base_GHCziNum_zdfNumInt_closure' linker_unload: resolveObjs failed ==70613== ==70613== HEAP SUMMARY:
-Ryan
On Sun, Sep 1, 2013 at 10:46 PM, Austin Seipp
wrote: I have also not seen this test fail on amd64/Linux since Simon committed it. From the valgrind output, it looks like your machine is 32bit, correct Ryan? Edward told me yesterday on IRC he saw this fail on 64bit Linux, so I'm a little confused.
Can you please try this?
$ cd testsuite/tests/rts $ make TEST="linker_unload" EXTRA_HC_OPTS="-debug" $ valgrind ./linker_unload
This will link you with a debug copy of the RTS, so Valgrind/GDB can relate errors back to the relevant source code. Perhaps this will help shed light on your problem.
On Sun, Sep 1, 2013 at 9:39 PM, Edward Z. Yang
wrote: However, as far as I can tell, it is not 100% reproduceable. In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63, this test does not fail.
Edward
Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700 2013:
Yes, this one is failing for me too. Probably related to the recent object unload patch for http://ghc.haskell.org/trac/ghc/ticket/8039
Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700 2013:
That test builds an executable named 'linker_unload' which segfaults for me. Valgrind says this:
==42800== Invalid read of size 8 ==42800== at 0x66945F: checkUnload (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x657F7A: GarbageCollect (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x651790: scheduleDoGC (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x6518B4: performGC_ (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x403BB1: main (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== Address 0x5bfdd20 is 80 bytes inside a block of size 120 free'd ==42800== at 0x4C273F0: free (vg_replace_malloc.c:446) ==42800== by 0x66945E: checkUnload (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x657F7A: GarbageCollect (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x651790: scheduleDoGC (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x6518B4: performGC_ (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) ==42800== by 0x403BB1: main (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
This went the same across a couple different independent checkouts.
-Ryan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin - PGP: 4096R/0x91384671
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin - PGP: 4096R/0x91384671

Ah, yes I see. Well, giving it the proper arguments when running via
valgrind puts me back to an "Invalid read" segfault. I confirmed that the
linker_unload executable itself is 64 bit:
$ file linker_unload
linker_unload: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped
==72103== Command: ./linker_unload
/home/beehive/ryan_scratch/ghc-working/libraries/base/dist-install/build/libHSbase-4.7.0.0.a
/home/beehive/ryan_scratch/ghc-working/libraries/ghc-prim/dist-install/build/libHSghc-prim-0.3.1.0.a
/home/beehive/ryan_scratch/ghc-working/libraries/integer-gmp/dist-install/build/libHSinteger-gmp-0.5.1.0.a
==72103==
==72103== Invalid read of size 8
==72103== at 0x479F9F: checkUnload (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103== by 0x4689DA: GarbageCollect (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103== by 0x4621F0: scheduleDoGC (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103== by 0x462314: performGC_ (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103== by 0x403341: main (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103== Address 0xf45ed70 is 80 bytes inside a block of size 120 free'd
==72103== at 0x4A063F0: free (vg_replace_malloc.c:446)
==72103== by 0x479F9E: checkUnload (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103== by 0x4689DA: GarbageCollect (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103== by 0x4621F0: scheduleDoGC (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103== by 0x462314: performGC_ (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103== by 0x403341: main (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103==
On Sun, Sep 1, 2013 at 11:01 PM, Austin Seipp
Oops, should have said this: if you checkout the Makefile for testsuite/tests/rts - at the very bottom - you'll see the linker_unload target. When run, the executable needs some arguments so it knows what to try and load:
--- ./linker_unload $(BASE) $(GHC_PRIM) $(INTEGER_GMP) ---
So you also need to provide the right arguments. Sorry about that!
On Sun, Sep 1, 2013 at 9:54 PM, Ryan Newton
wrote: Hi Austin,
Should have said -- this is 64-bit RHEL 6 (my academic departments standardized configuration).
$ uname -a Linux 2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
Weirdly it seems to have a different behavior when run by "make" and by hand. When I run the make command you provided it segfaults with error code 2:
cd . && $MAKE -s --no-print-directory linker_unload
linker_unload.run.stdout 2>linker_unload.run.stderr Wrong exit code (expected 0 , actual 2 ) Stdout: Stderr: make[1]: *** [linker_unload] Segmentation fault (core dumped) *** unexpected failure for linker_unload(normal) Unexpected results from: TEST="linker_unload"
But then when I run it by hand with "./linker_unload" or "valgrind ./linker_unload" I get an unknown symbol error with exit code 1:
==70613== linker_unload: Test.o: unknown symbol `base_GHCziNum_zdfNumInt_closure' linker_unload: resolveObjs failed ==70613== ==70613== HEAP SUMMARY:
-Ryan
On Sun, Sep 1, 2013 at 10:46 PM, Austin Seipp
wrote: I have also not seen this test fail on amd64/Linux since Simon committed it. From the valgrind output, it looks like your machine is 32bit, correct Ryan? Edward told me yesterday on IRC he saw this fail on 64bit Linux, so I'm a little confused.
Can you please try this?
$ cd testsuite/tests/rts $ make TEST="linker_unload" EXTRA_HC_OPTS="-debug" $ valgrind ./linker_unload
This will link you with a debug copy of the RTS, so Valgrind/GDB can relate errors back to the relevant source code. Perhaps this will help shed light on your problem.
On Sun, Sep 1, 2013 at 9:39 PM, Edward Z. Yang
wrote: However, as far as I can tell, it is not 100% reproduceable. In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63, this test does not fail.
Edward
Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700 2013:
Yes, this one is failing for me too. Probably related to the recent object unload patch for http://ghc.haskell.org/trac/ghc/ticket/8039
Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700
2013:
That test builds an executable named 'linker_unload' which segfaults for me. Valgrind says this:
==42800== Invalid read of size 8 ==42800== at 0x66945F: checkUnload (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x657F7A: GarbageCollect (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x651790: scheduleDoGC (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x6518B4: performGC_ (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x403BB1: main (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== Address 0x5bfdd20 is 80 bytes inside a block of size 120 free'd ==42800== at 0x4C273F0: free (vg_replace_malloc.c:446) ==42800== by 0x66945E: checkUnload (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x657F7A: GarbageCollect (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x651790: scheduleDoGC (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x6518B4: performGC_ (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
==42800== by 0x403BB1: main (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
This went the same across a couple different independent checkouts.
-Ryan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin - PGP: 4096R/0x91384671
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin - PGP: 4096R/0x91384671

I (think) I see the problem, but maybe I'm just tired and shooting in the dark.
The only time checkUnload really iteratively calls free is in
CheckUnload.c (I say 'iteratively', because the fact you're
touching/freeing blocks inside already free blocks make me
suspicious.) The relevant code is:
---------------------------------------------------------------------------
// Look through the unloadable objects, and any object that is still
// marked as unreferenced can be physically unloaded, because we
// have no references to it.
prev = NULL;
for (oc = unloaded_objects; oc; prev = oc, oc = oc->next) {
if (oc->referenced == 0) {
if (prev == NULL) {
unloaded_objects = oc->next;
} else {
prev->next = oc->next;
}
IF_DEBUG(linker, debugBelch("Unloading object file %s\n",
oc->fileName));
freeObjectCode(oc);
} else {
IF_DEBUG(linker, debugBelch("Object file still in use: %s\n",
oc->fileName));
}
}
---------------------------------------------------------------------------
Note that we iterate over oc->next in order to check every unloadable
object. If the object can be unloaded, we call freeObjectCode:
---------------------------------------------------------------------------
void freeObjectCode (ObjectCode *oc)
{
....
stgFree(oc->fileName);
stgFree(oc->archiveMemberName);
stgFree(oc);
}
---------------------------------------------------------------------------
So it would seem we free the object we point to during each traversal.
This is probably bad and could lead to very weird behavior probably.
Ryan, can you do one final thing? When you run that program, be sure
to specify `+RTS -Dl` (must be linked with -debug.) This will enable
all the debug output where the linker is concerned. There will be a
few hundred lines just for initialization (based on my machine.) If my
theory is correct, you'll probably see stuff like 'Unloading object
file ...' right as the invalid read/segfault occurs.
On Sun, Sep 1, 2013 at 11:28 PM, Ryan Newton
Ah, yes I see. Well, giving it the proper arguments when running via valgrind puts me back to an "Invalid read" segfault. I confirmed that the linker_unload executable itself is 64 bit:
$ file linker_unload linker_unload: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped
==72103== Command: ./linker_unload /home/beehive/ryan_scratch/ghc-working/libraries/base/dist-install/build/libHSbase-4.7.0.0.a /home/beehive/ryan_scratch/ghc-working/libraries/ghc-prim/dist-install/build/libHSghc-prim-0.3.1.0.a /home/beehive/ryan_scratch/ghc-working/libraries/integer-gmp/dist-install/build/libHSinteger-gmp-0.5.1.0.a ==72103== ==72103== Invalid read of size 8 ==72103== at 0x479F9F: checkUnload (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x4689DA: GarbageCollect (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x4621F0: scheduleDoGC (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x462314: performGC_ (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x403341: main (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== Address 0xf45ed70 is 80 bytes inside a block of size 120 free'd ==72103== at 0x4A063F0: free (vg_replace_malloc.c:446) ==72103== by 0x479F9E: checkUnload (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x4689DA: GarbageCollect (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x4621F0: scheduleDoGC (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x462314: performGC_ (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x403341: main (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103==
On Sun, Sep 1, 2013 at 11:01 PM, Austin Seipp
wrote: Oops, should have said this: if you checkout the Makefile for testsuite/tests/rts - at the very bottom - you'll see the linker_unload target. When run, the executable needs some arguments so it knows what to try and load:
--- ./linker_unload $(BASE) $(GHC_PRIM) $(INTEGER_GMP) ---
So you also need to provide the right arguments. Sorry about that!
On Sun, Sep 1, 2013 at 9:54 PM, Ryan Newton
wrote: Hi Austin,
Should have said -- this is 64-bit RHEL 6 (my academic departments standardized configuration).
$ uname -a Linux 2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
Weirdly it seems to have a different behavior when run by "make" and by hand. When I run the make command you provided it segfaults with error code 2:
cd . && $MAKE -s --no-print-directory linker_unload
linker_unload.run.stdout 2>linker_unload.run.stderr Wrong exit code (expected 0 , actual 2 ) Stdout: Stderr: make[1]: *** [linker_unload] Segmentation fault (core dumped) *** unexpected failure for linker_unload(normal) Unexpected results from: TEST="linker_unload"
But then when I run it by hand with "./linker_unload" or "valgrind ./linker_unload" I get an unknown symbol error with exit code 1:
==70613== linker_unload: Test.o: unknown symbol `base_GHCziNum_zdfNumInt_closure' linker_unload: resolveObjs failed ==70613== ==70613== HEAP SUMMARY:
-Ryan
On Sun, Sep 1, 2013 at 10:46 PM, Austin Seipp
wrote: I have also not seen this test fail on amd64/Linux since Simon committed it. From the valgrind output, it looks like your machine is 32bit, correct Ryan? Edward told me yesterday on IRC he saw this fail on 64bit Linux, so I'm a little confused.
Can you please try this?
$ cd testsuite/tests/rts $ make TEST="linker_unload" EXTRA_HC_OPTS="-debug" $ valgrind ./linker_unload
This will link you with a debug copy of the RTS, so Valgrind/GDB can relate errors back to the relevant source code. Perhaps this will help shed light on your problem.
On Sun, Sep 1, 2013 at 9:39 PM, Edward Z. Yang
wrote: However, as far as I can tell, it is not 100% reproduceable. In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63, this test does not fail.
Edward
Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700 2013:
Yes, this one is failing for me too. Probably related to the recent object unload patch for http://ghc.haskell.org/trac/ghc/ticket/8039
Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700 2013: > That test builds an executable named 'linker_unload' which > segfaults > for > me. Valgrind says this: > > > ==42800== Invalid read of size 8 > ==42800== at 0x66945F: checkUnload (in > > > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) > ==42800== by 0x657F7A: GarbageCollect (in > > > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) > ==42800== by 0x651790: scheduleDoGC (in > > > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) > ==42800== by 0x6518B4: performGC_ (in > > > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) > ==42800== by 0x403BB1: main (in > > > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) > ==42800== Address 0x5bfdd20 is 80 bytes inside a block of > size > 120 > free'd > ==42800== at 0x4C273F0: free (vg_replace_malloc.c:446) > ==42800== by 0x66945E: checkUnload (in > > > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) > ==42800== by 0x657F7A: GarbageCollect (in > > > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) > ==42800== by 0x651790: scheduleDoGC (in > > > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) > ==42800== by 0x6518B4: performGC_ (in > > > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) > ==42800== by 0x403BB1: main (in > > > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) > > This went the same across a couple different independent > checkouts. > > -Ryan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin - PGP: 4096R/0x91384671
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin - PGP: 4096R/0x91384671
-- Regards, Austin - PGP: 4096R/0x91384671

Ryan, can you do one final thing? When you run that program, be sure to specify `+RTS -Dl` (must be linked with -debug.) This will enable all the debug output where the linker is concerned. There will be a few hundred lines just for initialization (based on my machine.) If my theory is correct, you'll probably see stuff like 'Unloading object file ...' right as the invalid read/segfault occurs.
Hi Austin, I did this, and it produced a 97MB text file of debug output, the tail end of which was: *initLinker: idempotent return* *lookupSymbol: value of stg_gc_unpt_r1 is 0x485570* *`stg_gc_unpt_r1' resolves to 0x485570Reloc: P = 0x40b510f3 S = 0x485570 A = 0xfffffffffffffffc* *relocations for section 3 using symtab 8* *Rel entry 0 is raw( (nil) 0x800000001 (nil)) lookupSymbol: looking up base_ControlziApplicative_zdfApplicativeIO3_info* *initLinker: start* *initLinker: idempotent return* *lookupSymbol: value of base_ControlziApplicative_zdfApplicativeIO3_info is 0x40b51058* *`base_ControlziApplicative_zdfApplicativeIO3_info' resolves to 0x40b51058Reloc: P = 0x40b51100 S = 0x40b51058 A = (nil)* *resolveObjs: done* *lookupSymbol: looking up f* *initLinker: start* *initLinker: idempotent return* *lookupSymbol: value of f is 0x440330c0* *initLinker: start* *initLinker: idempotent return* *unloadObj: Test.o* *Checking whether to unload Test.o* *Unloading object file Test.o* And that's when it segfaulted (notusing valgrind). If it is of any use, here is the full output, which fortunately compresses down to 4.4MB: http://www.cs.indiana.edu/~rrnewton/temp/linker_unload_debug_output.txt.bz2 Best, -Ryan P.S. Here is the equivalent output from the same thing being run under valgrind: initLinker: idempotent return lookupSymbol: value of base_ControlziApplicative_zdfApplicativeIO3_info is 0x4c15058 `base_ControlziApplicative_zdfApplicativeIO3_info' resolves to 0x4c15058Reloc: P = 0x4c15100 S = 0x4c15058 A = (nil) resolveObjs: done lookupSymbol: looking up f initLinker: start initLinker: idempotent return lookupSymbol: value of f is 0x4c0f0c0 initLinker: start initLinker: idempotent return unloadObj: Test.o Checking whether to unload Test.o Unloading object file Test.o ==9030== Invalid read of size 8 ==9030== at 0x492502: checkUnload (CheckUnload.c:286) ==9030== by 0x476580: GarbageCollect (GC.c:666) ==9030== by 0x46ADCD: scheduleDoGC (Schedule.c:1652) ==9030== by 0x46B976: performGC_ (Schedule.c:2551) ==9030== by 0x46B9AE: performMajorGC (Schedule.c:2565) ==9030== by 0x4043E1: main (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload2) ==9030== Address 0x95c4580 is 80 bytes inside a block of size 120 free'd ==9030== at 0x4A063F0: free (vg_replace_malloc.c:446) ==9030== by 0x4656D5: stgFree (RtsUtils.c:107) ==9030== by 0x45DDF4: freeObjectCode (Linker.c:2087) ==9030== by 0x4924CF: checkUnload (CheckUnload.c:295) ==9030== by 0x476580: GarbageCollect (GC.c:666) ==9030== by 0x46ADCD: scheduleDoGC (Schedule.c:1652) ==9030== by 0x46B976: performGC_ (Schedule.c:2551) ==9030== by 0x46B9AE: performMajorGC (Schedule.c:2565) ==9030== by 0x4043E1: main (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload2) ==9030==

That's the bug. Fix coming! Simon On 02/09/13 05:46, Austin Seipp wrote:
I (think) I see the problem, but maybe I'm just tired and shooting in the dark.
The only time checkUnload really iteratively calls free is in CheckUnload.c (I say 'iteratively', because the fact you're touching/freeing blocks inside already free blocks make me suspicious.) The relevant code is:
--------------------------------------------------------------------------- // Look through the unloadable objects, and any object that is still // marked as unreferenced can be physically unloaded, because we // have no references to it. prev = NULL; for (oc = unloaded_objects; oc; prev = oc, oc = oc->next) { if (oc->referenced == 0) { if (prev == NULL) { unloaded_objects = oc->next; } else { prev->next = oc->next; } IF_DEBUG(linker, debugBelch("Unloading object file %s\n", oc->fileName)); freeObjectCode(oc); } else { IF_DEBUG(linker, debugBelch("Object file still in use: %s\n", oc->fileName)); } } ---------------------------------------------------------------------------
Note that we iterate over oc->next in order to check every unloadable object. If the object can be unloaded, we call freeObjectCode:
--------------------------------------------------------------------------- void freeObjectCode (ObjectCode *oc) { .... stgFree(oc->fileName); stgFree(oc->archiveMemberName); stgFree(oc); } ---------------------------------------------------------------------------
So it would seem we free the object we point to during each traversal. This is probably bad and could lead to very weird behavior probably.
Ryan, can you do one final thing? When you run that program, be sure to specify `+RTS -Dl` (must be linked with -debug.) This will enable all the debug output where the linker is concerned. There will be a few hundred lines just for initialization (based on my machine.) If my theory is correct, you'll probably see stuff like 'Unloading object file ...' right as the invalid read/segfault occurs.
On Sun, Sep 1, 2013 at 11:28 PM, Ryan Newton
wrote: Ah, yes I see. Well, giving it the proper arguments when running via valgrind puts me back to an "Invalid read" segfault. I confirmed that the linker_unload executable itself is 64 bit:
$ file linker_unload linker_unload: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped
==72103== Command: ./linker_unload /home/beehive/ryan_scratch/ghc-working/libraries/base/dist-install/build/libHSbase-4.7.0.0.a /home/beehive/ryan_scratch/ghc-working/libraries/ghc-prim/dist-install/build/libHSghc-prim-0.3.1.0.a /home/beehive/ryan_scratch/ghc-working/libraries/integer-gmp/dist-install/build/libHSinteger-gmp-0.5.1.0.a ==72103== ==72103== Invalid read of size 8 ==72103== at 0x479F9F: checkUnload (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x4689DA: GarbageCollect (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x4621F0: scheduleDoGC (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x462314: performGC_ (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x403341: main (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== Address 0xf45ed70 is 80 bytes inside a block of size 120 free'd ==72103== at 0x4A063F0: free (vg_replace_malloc.c:446) ==72103== by 0x479F9E: checkUnload (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x4689DA: GarbageCollect (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x4621F0: scheduleDoGC (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x462314: performGC_ (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103== by 0x403341: main (in /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload) ==72103==
On Sun, Sep 1, 2013 at 11:01 PM, Austin Seipp
wrote: Oops, should have said this: if you checkout the Makefile for testsuite/tests/rts - at the very bottom - you'll see the linker_unload target. When run, the executable needs some arguments so it knows what to try and load:
--- ./linker_unload $(BASE) $(GHC_PRIM) $(INTEGER_GMP) ---
So you also need to provide the right arguments. Sorry about that!
On Sun, Sep 1, 2013 at 9:54 PM, Ryan Newton
wrote: Hi Austin,
Should have said -- this is 64-bit RHEL 6 (my academic departments standardized configuration).
$ uname -a Linux 2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
Weirdly it seems to have a different behavior when run by "make" and by hand. When I run the make command you provided it segfaults with error code 2:
cd . && $MAKE -s --no-print-directory linker_unload
linker_unload.run.stdout 2>linker_unload.run.stderr Wrong exit code (expected 0 , actual 2 ) Stdout: Stderr: make[1]: *** [linker_unload] Segmentation fault (core dumped) *** unexpected failure for linker_unload(normal) Unexpected results from: TEST="linker_unload"
But then when I run it by hand with "./linker_unload" or "valgrind ./linker_unload" I get an unknown symbol error with exit code 1:
==70613== linker_unload: Test.o: unknown symbol `base_GHCziNum_zdfNumInt_closure' linker_unload: resolveObjs failed ==70613== ==70613== HEAP SUMMARY:
-Ryan
On Sun, Sep 1, 2013 at 10:46 PM, Austin Seipp
wrote: I have also not seen this test fail on amd64/Linux since Simon committed it. From the valgrind output, it looks like your machine is 32bit, correct Ryan? Edward told me yesterday on IRC he saw this fail on 64bit Linux, so I'm a little confused.
Can you please try this?
$ cd testsuite/tests/rts $ make TEST="linker_unload" EXTRA_HC_OPTS="-debug" $ valgrind ./linker_unload
This will link you with a debug copy of the RTS, so Valgrind/GDB can relate errors back to the relevant source code. Perhaps this will help shed light on your problem.
On Sun, Sep 1, 2013 at 9:39 PM, Edward Z. Yang
wrote: However, as far as I can tell, it is not 100% reproduceable. In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63, this test does not fail.
Edward
Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700 2013: > Yes, this one is failing for me too. Probably related to the > recent object unload patch for > http://ghc.haskell.org/trac/ghc/ticket/8039 > > Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700 > 2013: >> That test builds an executable named 'linker_unload' which >> segfaults >> for >> me. Valgrind says this: >> >> >> ==42800== Invalid read of size 8 >> ==42800== at 0x66945F: checkUnload (in >> >> >> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) >> ==42800== by 0x657F7A: GarbageCollect (in >> >> >> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) >> ==42800== by 0x651790: scheduleDoGC (in >> >> >> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) >> ==42800== by 0x6518B4: performGC_ (in >> >> >> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) >> ==42800== by 0x403BB1: main (in >> >> >> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) >> ==42800== Address 0x5bfdd20 is 80 bytes inside a block of >> size >> 120 >> free'd >> ==42800== at 0x4C273F0: free (vg_replace_malloc.c:446) >> ==42800== by 0x66945E: checkUnload (in >> >> >> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) >> ==42800== by 0x657F7A: GarbageCollect (in >> >> >> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) >> ==42800== by 0x651790: scheduleDoGC (in >> >> >> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) >> ==42800== by 0x6518B4: performGC_ (in >> >> >> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) >> ==42800== by 0x403BB1: main (in >> >> >> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload) >> >> This went the same across a couple different independent >> checkouts. >> >> -Ryan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin - PGP: 4096R/0x91384671
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin - PGP: 4096R/0x91384671
participants (4)
-
Austin Seipp
-
Edward Z. Yang
-
Ryan Newton
-
Simon Marlow