alignment and the evil gc assertion failure

So a long time ago (I think when 6.10 first came out, the problem didn't happen with the previous version, and I think 6.10 changed how the FFI used alignment) I filed a ghc ticket about a gc assertion failure. Unfortunately it was so hard to reproduce and reduce to a manageable example that I wound up thinking I had fixed it and closing it as user error. However, I'm pretty sure I tracked down what the problem was since I changed something and I haven't had that crash since. To recap, the crash is an assertion failure in the gc, I've seen two positions: seq: internal error: ASSERTION FAILED: file rts/dist/build/sm/Evac_thr.c, line 298 seq: internal error: ASSERTION FAILED: file rts/dist/build/sm/Evac_thr.c, line 369 The problem was in the marshalling of a certain struct. The struct has 3 Color fields and two char fields. The Colors are simply triples of chars. I'd been using an alignment macro I've seen around: #let alignment t = "%lu", (unsigned long)offsetof(struct {char x__; t (y__); },y__) Since my struct is made out of chars, this macro returns an alignment of 1. However, an alignment of 1 seems to be what leads to the GC crash. After setting the alignment to 4 I've never had the crash again. So... what's going on here? Is the macro wrong? Alignment 1 seems correct for something built of chars... or does a 'struct { char, char, char }' embedded in another struct turn it into alignment 4 somehow? Alignment seems like a particularly problematic part of the FFI, which is in all other respects very easy to use. It's low level, poorly understood (well, by me at least), not super well documented, and one little mistake can either be harmless, or lead to a *very* hard to track down bug. If the alignment macro is correct, can it be built into hsc2hs? Copy and pasting some magic I saw on the net into every hsc file doesn't give me a good feeling. If it's incorrect, what would a correct one be? One odd thing about that macro is that if you mis-spell a field name, gcc gets a bus error (OS X 10.5.8, gcc 4.0.1, does anyone else see this?) Or is the alignment correct and it really is a ghc bug? If so, how can I help track it down? The best I can think of is to make a small program with that same data structure, pass it around some, and then generate and collect a lot of garbage, but this bug has been really hard to pin down in the past, change one little thing and it disappears only to pop up again in 6 months.

On 06/09/2010 19:03, Evan Laforge wrote:
So a long time ago (I think when 6.10 first came out, the problem didn't happen with the previous version, and I think 6.10 changed how the FFI used alignment) I filed a ghc ticket about a gc assertion failure. Unfortunately it was so hard to reproduce and reduce to a manageable example that I wound up thinking I had fixed it and closing it as user error. However, I'm pretty sure I tracked down what the problem was since I changed something and I haven't had that crash since.
To recap, the crash is an assertion failure in the gc, I've seen two positions:
seq: internal error: ASSERTION FAILED: file rts/dist/build/sm/Evac_thr.c, line 298 seq: internal error: ASSERTION FAILED: file rts/dist/build/sm/Evac_thr.c, line 369
The problem was in the marshalling of a certain struct. The struct has 3 Color fields and two char fields. The Colors are simply triples of chars. I'd been using an alignment macro I've seen around:
#let alignment t = "%lu", (unsigned long)offsetof(struct {char x__; t (y__); },y__)
Since my struct is made out of chars, this macro returns an alignment of 1. However, an alignment of 1 seems to be what leads to the GC crash. After setting the alignment to 4 I've never had the crash again.
I've poked around and apart from noticing that we're allocating a bit too much memory sometimes (but not in the alignment==1 case), I didn't find anything wrong. GHC will always give you memory that is at least word-aligned (4 or 8 bytes) anyway. Assertions in the RTS are only activated when you compile with -debug. Do you get a crash without -debug? What are the exact results you get with 6.12.3, or better still HEAD if you're able to test that?
So... what's going on here? Is the macro wrong? Alignment 1 seems correct for something built of chars... or does a 'struct { char, char, char }' embedded in another struct turn it into alignment 4 somehow?
Alignment seems like a particularly problematic part of the FFI, which is in all other respects very easy to use. It's low level, poorly understood (well, by me at least), not super well documented, and one little mistake can either be harmless, or lead to a *very* hard to track down bug.
If the alignment macro is correct, can it be built into hsc2hs?
It looks good enough to me, and it uses the same trick that GHC's configure script uses to compute the required alignment for the basic types. We could certainly add this as one of the built-in macros in hsc2hs - a patch would help though! Cheers, Simon
Copy and pasting some magic I saw on the net into every hsc file doesn't give me a good feeling. If it's incorrect, what would a correct one be? One odd thing about that macro is that if you mis-spell a field name, gcc gets a bus error (OS X 10.5.8, gcc 4.0.1, does anyone else see this?) Or is the alignment correct and it really is a ghc bug? If so, how can I help track it down? The best I can think of is to make a small program with that same data structure, pass it around some, and then generate and collect a lot of garbage, but this bug has been really hard to pin down in the past, change one little thing and it disappears only to pop up again in 6 months. _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
participants (2)
-
Evan Laforge
-
Simon Marlow