
On 16/07/2010 12:36, Axel Simon wrote:
Dear Haskell maintainers,
I've progressed a little and found that the problem is down to accessing global variables that are declared in dynamic libraries. In a nutshell, this doesn't as the addresses of these global variables are all wrong when ghci is executing the code. So, I think I hit:
http://hackage.haskell.org/trac/ghc/ticket/781
I was able to work around this problem by compiling the C modules with -fPIC. This bug is pretty bad, I'd say. I've added myself to its CC list.
Urgh. It's a nasty bug, but not one that we can fix, because it's an artifact of the small memory model used on x86_64. The only fix is to use -fPIC. It might be possible to use -fPIC either by default, or perhaps just for .c files and when compiling data references from FFI declarations in Haskell code, that's something we could look into. We might want -fPIC on by default anyway if we switch to using dynamic linking by default (but we're not yet sure what ramifications that will have). Cheers, Simon
Cheers, Axel
On 14.07.2010, at 16:51, Axel Simon wrote:
Hi all,
I'm trying to debug a segfault relating to the memory management in Gtk2Hs. Rather than make you read the ticket http://hackage.haskell.org/trac/gtk2hs/ticket/1183 , I'll describe the problem:
- compiler 6.12.1 or 6.12.3 - darcs head of Gtk2Hs with #define DEBUG instead of #undef DEBUG in gtk/Graphics/UI/Gtk/General/hsthread.c - platform Ubuntu Linux, x86-64 - to reproduce: cd gtk2hs/gtk/demo/hello and run ghci World.hs and type 'main'
A window with the "Hello World" button appears. After a few seconds, the GC runs and the finaliser of the GtkButton is run since the Haskell program no longer holds a reference to that object (only the GtkWindow in C land has).
Thus, the GC calls a C function gtk2hs_g_object_unref_from_mainloop which is supposed to enqueue the object into a global data structure from which objects are later taken and g_object_unref is called on them.
This global data structure is protected by a mutex, which is acquired using g_static_mutex_lock:
void gtk2hs_g_object_unref_from_mainloop(gpointer object) {
int mutex_locked = 0; if (threads_initialised) { #ifdef DEBUG printf("acquiring lock to add a %s object at %lx\n", g_type_name(G_OBJECT_TYPE(object)), (unsigned long) object); printf("value of lock function is %lx\n", (unsigned long) g_thread_functions_for_glib_use.mutex_lock); #endif g_rand_new(); #if defined( WIN32 ) EnterCriticalSection(>k2hs_finalizer_mutex); #else g_static_mutex_lock(>k2hs_finalizer_mutex); #endif mutex_locked = 1; } [..]
The program prints:
acquiring lock to add a GtkButton object at 22d8020 value of lock function is 0 zsh: segmentation fault ghci World
Now the debugging weirdness starts. Whatever I do, I cannot get gdb to find the symbol gtk2hs_g_object_unref_from_mainloop.
Since the function above is contained in a C file that comes with our Haskell library, I tried to add "cc-options: -g" and "cc- options: -ggdb -O0", but maybe somewhere symbols are stripped. So I added the bogus function call to "g_rand_new()" which is not called anywhere else and gdb stops as follows:
acquiring lock to add a GtkButton object at 2105020 value of lock function is 0 [Switching to Thread 0x7ffff41ff710 (LWP 15735)]
Breakpoint 12, 0x00007ffff115bfa0 in g_rand_new () from /usr/lib/ libglib-2.0.so
This all seems reasonable, but:
(gdb) bt #0 0x00007ffff115bfa0 in g_rand_new () from /usr/lib/libglib-2.0.so #1 0x00000000419b3792 in ?? () #2 0x00007ffff678f078 in ?? ()
i.e. the calling context is broken. I'm very, very sure that the caller is indeed the above mentioned function and since g_rand_new isn't called anywhere in my Haskell program (and otherwise the calling context would be sane). I'm also passing the address of gtk2hs_g_object_unref_from_mainloop as FinalizerPtr to all my ForeignPtrs, so there is no inlining going on.
Back to the culprit, the call to g_static_mutex_lock. This is a macro that expands to
*g_thread_functions_for_glib_use.mutex_lock
where g_thread_functions_for_glib is a global variable that contains a lot of function pointers. At the break point, it contains this:
(gdb) print g_thread_functions_for_glib_use $33 = {mutex_new = 0x7ffff0cd9820
, mutex_lock = 0x7ffff6c8b3c0<__pthread_mutex_lock>, mutex_trylock = 0x7ffff0cd97b0 , mutex_unlock = 0x7ffff6c8ca00<__pthread_mutex_unlock>, mutex_free = 0x7ffff0cd9740 , [..] So the call to g_mutex_lock should call the function __pthread_mutex_lock but it calls NULL.
I hoped that writing this email would give me a bit more insight into the problem, but for now I suspect that something overwrites either the stack or the code of the function.
On the same platform, the compiled version prints:
acquiring lock to add a GtkButton object at 1b05820 value of lock function is 7f7adcabd3c0 within mutex: adding finalizer to a GtkButton object!
On Mac OS or i386, using ghci or ghc, version 6.10.4, it works as well. Now for the fun bit: on i386 using ghci version 6.12.1 it works too.
So it's an x86-64 and ghc 6.12.1 bug. According to Christian Maeder who submitted the ticket, the problem persists in 6.12.3.
Any hints and help appreciated, Cheers, Axel
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ Gtk2hs-devel mailing list Gtk2hs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gtk2hs-devel