
Hi Ian,
I'm building ghc-6.6.20070314 using the unregisterized ghc-6.4.2.
(BTW, the unregisterized 6.4.2 seems quite reliable. I was able to
build happy-1.15 and alex-2.0.1 without any problem.)
I configured 6.6.20070314 for debugging by putting
GhcUnregisterised=YES
GhcWithNativeCodeGen=NO
GhcWithInterpreter=NO
SplitObjs=NO
GhcWithSMP=NO
SRC_HC_OPTS+=-debug -L/usr/local/lib
GhcRTSWays=debug
GhcRtsHcOpts+=-optc-DDEBUG
GhcRtsCcOpts+=-optc-g
EXTRA_LD_OPTS=-L/usr/local/lib -lbfd -liberty
in mk/build.mk. With debugging turned on, the new 6.6
no longer hangs when compiling rts/Linker.c, rather the new
6.6 ghc-inplace fails compiling the first file it tries, Adjustor.c.
I was running "top -u" in another window and when the build
failed the memory used went unprintably high (i.e., "top" couldn't
print it, nor could you print what I said when I noticed this).
I was able to re-run the failed compilation under gdb. I added
"-v" to ghc's command line. The big slowdown seemed to occur while
ghc-inplace was generating the command line for the c compiler.
This is when I interrupted it. From gdb:
(gdb) run -B/tmp/ghc -optc-O -optc-Wall -optc-W -optc-Wstrict-
prototypes -optc-Wmissing-prototypes -optc-Wmissing-declarations -
optc-Winline -optc-Waggregate-return -optc-Wbad-function-cast -optc-
I../includes -optc-I. -optc-Iparallel -optc-DCOMPILING_RTS -optc-
fomit-frame-pointer -optc-optc-g -optc-DNOSMP -optc-I/usr/local/
include -optc-fno-strict-aliasing -H16m -O -debug -L/usr/local/lib -
optc-O2 -optc-DDEBUG -optc-DNOSMP -static -I/usr/local/include -I. -
#include HCIncludes.h -fvia-C -dcmm-lint -v -c Adjustor.c -o
Adjustor.o
Starting program: /tmp/ghc/compiler/stage1/ghc-6.6.20070314 -B/tmp/
ghc -optc-O -optc-Wall -optc-W -optc-Wstrict-prototypes -optc-
Wmissing-prototypes -optc-Wmissing-declarations -optc-Winline -optc-
Waggregate-return -optc-Wbad-function-cast -optc-I../includes -optc-
I. -optc-Iparallel -optc-DCOMPILING_RTS -optc-fomit-frame-pointer -
optc-optc-g -optc-DNOSMP -optc-I/usr/local/include -optc-fno-strict-
aliasing -H16m -O -debug -L/usr/local/lib -optc-O2 -optc-DDEBUG -optc-
DNOSMP -static -I/usr/local/include -I. -#include HCIncludes.h -fvia-
C -dcmm-lint -v -c Adjustor.c -o Adjustor.o
Glasgow Haskell Compiler, Version 6.6.20070314, for Haskell 98,
compiled by GHC version 6.4.2
Using package config file: /tmp/ghc/driver/package.conf.inplace
wired-in package base not found.
wired-in package rts mapped to rts-1.0
wired-in package haskell98 not found.
wired-in package template-haskell not found.
Hsc static flags: -funregisterised -static -static
Created temporary directory: /tmp/ghc27577_0
*** C Compiler:
gcc -x c Adjustor.c -o /tmp/ghc27577_0/ghc27577_0.s -v -S -Wimplicit -
O -D__GLASGOW_HASKELL__=606 -DNO_REGS -DUSE_MINIINTERPRETER -O -Wall -
W -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -
Winline -Waggregate-return -Wbad-function-cast -I../includes -I. -
Iparallel -DCOMPILING_RTS -fomit-frame-pointer -optc-g -DNOSMP -I/usr/
local/include -fno-strict-aliasing -O2 -DDEBUG -DNOSMP -I /usr/local/
include -I . -I /tmp/ghc/includes -fwrapv
Using built-in specs.
Configured with: FreeBSD/amd64 system compiler
Thread model: posix
gcc version 3.4.6 [FreeBSD] 20060305
/usr/libexec/cc1 -quiet -v -I../includes -I. -^C
Program received signal SIGINT, Interrupt.
0x00000000015b56f3 in findMBlockMap (p=0x3a242700000) at MBlock.c:68
68 for( i = 0; i < mblock_map_count; i++ )
(gdb) bt
#0 0x00000000015b56f3 in findMBlockMap (p=0x3a242700000) at MBlock.c:68
#1 0x00000000015b574b in markHeapAlloced (p=0x3a242700000) at
MBlock.c:98
#2 0x00000000015b59df in getMBlocks (n=262145) at MBlock.c:280
#3 0x00000000015ac617 in allocMegaGroup (n=262145) at BlockAlloc.c:174
#4 0x00000000015ac3e0 in allocGroup (n=67108865) at BlockAlloc.c:72
#5 0x000000000159e24a in allocate (n=34359738372) at Storage.c:504
#6 0x000000000159e439 in allocatePinned (n=34359738372) at Storage.c:
593
#7 0x00000000015a1376 in newPinnedByteArrayzh_fast ()
#8 0x000000000159d3e2 in StgRun (f=0x15a1330

On Thu, Mar 15, 2007 at 11:13:02AM -0400, Gregory Wright wrote:
#6 0x000000000159e439 in allocatePinned (n=34359738372) at Storage.c: 593 #7 0x00000000015a1376 in newPinnedByteArrayzh_fast () #8 0x000000000159d3e2 in StgRun (f=0x15a1330
, basereg=0x3a2) at StgCRun.c:93 #9 0x00000000015990f3 in schedule (mainThread=0x2164080, initialCapability=0x3a2) at Schedule.c:932 Looks like someone is asking for too much memory (n=34359738372)!
I've taken a cursory look at this, but I wanted to send a note in case you know what is wrong off the top of your head.
I'll be away next week so I won't be able to easily test things on my amd64 box. I will be able to look at code, if you can point me to the right places. (Should I be looking at 6.4.2/6.6 differences in Storage.c or Schedule.c?)
Schedule.c doesn't look like the problem. What's happening is the scheduler (Schedule.c) is running some Haskell code (StgRun, which calls Haskell code in a loop) and the "Haskell" code it is calling now is the newPinnedByteArray# primop, defined in PrimOps.cmm. newPinnedByteArray# is then calling the RTS allocation functions. OK, this is while running the stage1 GHC, right? So we have the 6.6 Haskell code linked with the 6.4.2 RTS. At first glance it doesn't look as if the appropriate bits of the RTS have changed significantly, though. 34359738372 = 2^35 + 4, so it seems likely someone is really trying to ask for 4 words and something goes wrong somewhere. I think the first thing to do is to see if newPinnedByteArrayzh_fast is being passed plausible values. The easiest way is probably to set a breakpoint in gdb on newPinnedByteArrayzh_fast (Having "GhcRtsHcOpts += -keep-hc-files" in mk/build.mk will probably help so you can look at PrimOps.hc; unfortunately we don't seem to set hcsuf - we probably should. You might also want to check that the RTS wasn't compiled with optimisation on. Note that this is in the 6.4 tree, not the 6.6 one!) If you do make any changes to the RTS code or compilation options then you'll have to run make and make install in 6.4.2's rts/, then delete 6.6-branch's compiler/stage1/ghc-6.6* and run make stage=1 in compiler/. If the problem isn't the first time that newPinnedByteArrayzh_fast is called then see the "Going back in time" section of http://hackage.haskell.org/trac/ghc/wiki/DebuggingGhcCrashes for how to get to the right one easily. If the right value comes in at the start of newPinnedByteArrayzh_fast then stepping through with gdb and printing all the intermediate values should show where it goes wrong. If the wrong value goes in then you'll have to try and work out where it came from. The +RTS -Di flag (I think: the one for the interpreter) will show you what Haskell functions are being called, which may help. Thanks Ian

Hi Ian, On Mar 15, 2007, at 12:21 PM, Ian Lynagh wrote:
I think the first thing to do is to see if newPinnedByteArrayzh_fast is being passed plausible values. The easiest way is probably to set a breakpoint in gdb on newPinnedByteArrayzh_fast (Having "GhcRtsHcOpts += -keep-hc-files" in mk/build.mk will probably help so you can look at PrimOps.hc; unfortunately we don't seem to set hcsuf - we probably should. You might also want to check that the RTS wasn't compiled with optimisation on. Note that this is in the 6.4 tree, not the 6.6 one!)
The 6.4.2 compiler was built with by the hc-build script which uses GhcWithInterpreter=NO GhcWithNativeCodeGen=NO SplitObjs=NO GhcLibWays= in build.mk, so I assume it has optimization on. Can I simply add GhcRtsHcOpts += -O0 or should I change SRC_HC_OPTS with SRC_HC_OPTS += -O0 in the build.mk of the 6.4.2 tree? I'm also assuming that I can just rebuild 6.4.2 with optimization off on the target amd64 box and that I can still use my original .hc files from the i386 machine. Is that true?
If you do make any changes to the RTS code or compilation options then you'll have to run make and make install in 6.4.2's rts/, then delete 6.6-branch's compiler/stage1/ghc-6.6* and run make stage=1 in compiler/.
Oakey-dokey.
Thanks Ian
Best Wishes, Greg

On Thu, Mar 15, 2007 at 05:15:02PM -0400, Gregory Wright wrote:
in build.mk, so I assume it has optimization on. Can I simply add
GhcRtsHcOpts += -O0
or should I change SRC_HC_OPTS with
SRC_HC_OPTS += -O0
in the build.mk of the 6.4.2 tree? I'm also assuming that I can just rebuild 6.4.2 with optimization off on the target amd64 box and that I can still use my original .hc files from the i386 machine. Is that true?
Ah, yes, I'd forgotten it was an hc build, so you already have .hc files. They'll do fine - it's just useful sometimes to be able to refer to the same source code that gdb knows about. I think GhcRtsHcOpts += -O0 should do, but you can check it's actually being passed when you rerun make (you'll need to delete the PrimOps object files first). On the other hand, if you mean you're going to do another complete build then you may as well make everything non-optimised now rather than wishing you'd done so later. Thanks Ian

Hi Ian, On Mar 15, 2007, at 6:06 PM, Ian Lynagh wrote:
On Thu, Mar 15, 2007 at 05:15:02PM -0400, Gregory Wright wrote:
in build.mk, so I assume it has optimization on. Can I simply add
GhcRtsHcOpts += -O0
or should I change SRC_HC_OPTS with
SRC_HC_OPTS += -O0
in the build.mk of the 6.4.2 tree?
<snip>
On the other hand, if you mean you're going to do another complete build then you may as well make everything non-optimised now rather than wishing you'd done so later.
Yes, I was just going to do a complete rebuild with everything nonoptimized. It doesn't take too long on the new 2.4 GHz dual two core Opteron box :-) (Although I am careful to turn off SMP and not try a parallel make, greed being one of the deadly sins.) BW, Greg
participants (2)
-
Gregory Wright
-
Ian Lynagh