
Hi Ian,
I have made some more progress on understanding the build
failure on FreeBSD/amd64. I could use a check on my understanding
of the problem, though.
The setup: I have an unregisterized ghc-6.4.2 successfully built
on FreeBSD/amd64. It was bootstrapped from .hc files compiled
on FreeBSD/i386. I am attempting to use this compiler (the ghc-inplace
from the unregisterized build, not a fully installed compiler) to
build a recent ghc-6.6 branch from darcs (20070314).
The build of ghc-6.6-20070314 fails when compiling rts/Linker.c.
The failure is mostly reproducible (more about that below).
It's also worth remembering that when I tried to build an unregisterized
ghc-6.6 on FreeBSD/amd64 using .hc files from ghc-6.6 built on
FreeBSD/i386,
I had a crash at the same place, while trying to build rts/Linker.c
The failure comes from trying to allocate a huge amount of memory.
newPinnedByteArrayzh_fast is called with a giant argument, 0x4000000010.
So it looks like we're after 16 bytes, but the upper 32 bits has some
junk in it.
The above was the state of things just over a week ago. Since then,
I've worked to track down whether the bug is in the ghc-6.4.2 runtime
or in the ghc-6.6 code. The compiler that fails is the 6.6 stage1
compiler, which is a 6.6 compiler linked with the runtime from the
unregisterized 6.4.2 (let me know if I'm wrong about that).
I have rebuilt the unregisterized 6.4.2 with optimization turned off.
I haven't got any more information this way; it seems the problem is
really on the 6.6 side. Here is my reasoning:
I run the 6.6 compiler under the debugger:
greenhouse-george> gdb /tmp/ghc/compiler/stage1/ghc-6.6.20070314
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "amd64-marcel-freebsd"...
(gdb) dir /tmp/ghc-6.4.2/ghc/rts
Source directories searched: /tmp/ghc-6.4.2/ghc/rts:$cdir:$cwd
(gdb) b newPinnedByteArrayzh_fast
Breakpoint 1 at 0x163cbb4
(gdb) run -B/tmp/ghc -v -optc-O -optc-Wall -optc-W -optc-Wstrict-
prototypes -optc-Wmissing-prototypes -optc-Wmissing-declarations -
optc-Winline -optc-Waggregate-return -optc-Wbad-function-cast -optc-
I../includes -optc-I. -optc-Iparallel -optc-DCOMPILING_RTS -optc-
fomit-frame-pointer -optc-I/usr/local/include -optc-fno-strict-
aliasing -H16m -O -optc-O2 -static -I/usr/local/include -I. -#include
HCIncludes.h -fvia-C -dcmm-lint -c Linker.c -o Linker.o
Starting program: /tmp/ghc/compiler/stage1/ghc-6.6.20070314 -B/tmp/
ghc -v -optc-O -optc-Wall -optc-W -optc-Wstrict-prototypes -optc-
Wmissing-prototypes -optc-Wmissing-declarations -optc-Winline -optc-
Waggregate-return -optc-Wbad-function-cast -optc-I../includes -optc-
I. -optc-Iparallel -optc-DCOMPILING_RTS -optc-fomit-frame-pointer -
optc-I/usr/local/include -optc-fno-strict-aliasing -H16m -O -optc-O2 -
static -I/usr/local/include -I. -#include HCIncludes.h -fvia-C -dcmm-
lint -c Linker.c -o Linker.o
Breakpoint 1, 0x000000000163cbb4 in newPinnedByteArrayzh_fast ()
By playing around with it, I have isolated --- more or less --- when the
failure occurs. In this run, it was after 973 calls to
newPinnedByteArrayzh_fast.
If I quit gdb and re-run it, the number of calls is consistently the
same.
However, from one system boot to the next, it varies a bit.
Yesterday morning,
I had to skip 976 calls.
(gdb) c 973
Will ignore next 972 crossings of breakpoint 1. Continuing.
Glasgow Haskell Compiler, Version 6.6.20070314, for Haskell 98,
compiled by GHC version 6.4.2
Using package config file: /tmp/ghc/driver/package.conf.inplace
wired-in package base not found.
wired-in package rts mapped to rts-1.0
wired-in package haskell98 not found.
wired-in package template-haskell not found.
Hsc static flags: -static -static
Created temporary directory: /tmp/ghc1073_0
*** C Compiler:
gcc -x c Linker.c -o /tmp/ghc1073_0/ghc1073_0.s -v -S -Wimplicit -O -
D__GLASGOW_HASKELL__=606 -O -Wall -W -Wstrict-prototypes -Wmissing-
prototypes -Wmissing-declarations -Winline -Waggregate-return -Wbad-
function-cast -I../includes -I. -Iparallel -DCOMPILING_RTS -fomit-
frame-pointer -I/usr/local/include -fno-strict-aliasing -O2 -I /usr/
local/include -I . -I /tmp/ghc/includes -fwrapv
Breakpoint 1, 0x000000000163cbb4 in newPinnedByteArrayzh_fast ()
OK, I should be at the call which is going to blow up:
(gdb) bt
#0 0x000000000163cbb4 in newPinnedByteArrayzh_fast ()
#1 0x00000000016377ea in StgRun (f=0x163cbb0

Hi Greg, Good analysis so far. I think you're close to this one. Based on what you said, I looked at Compat.Unicode and there is indeed a type error in this foreign call: foreign import ccall unsafe "u_gencat" wgencat :: CInt -> Int The return type should be CInt, not Int. Try changing that and see if it helps. You might need to add some fromIntegrals. Cheers, Simon

On Thu, Mar 29, 2007 at 10:40:32AM +0100, Simon Marlow wrote:
Based on what you said, I looked at Compat.Unicode and there is indeed a type error in this foreign call:
foreign import ccall unsafe "u_gencat" wgencat :: CInt -> Int
The return type should be CInt, not Int. Try changing that and see if it helps. You might need to add some fromIntegrals.
Even if it's not the problem, it's certainly a bug. HEAD and 6.6 branch should now be fixed (both the compat copy and the base copy). Thanks Ian

Hi SImon, On Mar 29, 2007, at 5:40 AM, Simon Marlow wrote:
Hi Greg,
Good analysis so far. I think you're close to this one.
Thank you for checking over what I've done thus far.
Based on what you said, I looked at Compat.Unicode and there is indeed a type error in this foreign call:
foreign import ccall unsafe "u_gencat" wgencat :: CInt -> Int
The return type should be CInt, not Int. Try changing that and see if it helps. You might need to add some fromIntegrals.
I noticed this too yesterday and tried correcting it this morning. Changing the return type from Int to CInt didn't help. The problem will no doubt be subtle, yet entirely obvious in retrospect. I'll try to get some time to work on it in the next few days. It's still a bit puzzling why this problem affects FreeBSD, and apparently not Linux. Differences in header files?
Cheers, Simon
Best Wishes, Greg
participants (3)
-
Gregory Wright
-
Ian Lynagh
-
Simon Marlow