
I have some strange behavior with GHC 7.6.3 on Ubuntu 14 TLS when using FFI and I am looking for some ideas on what could be going on. Fundamentally, adding wait calls (delays coded in the C) changes the behavior of the C, in that returned status codes have proper values when there are delays, and return errors when there are no delays. But these same calls result in proper behavior on the Aardvark’s serial bus, return proper data, etc. Only the status get’s messed up. The module calls a thin custom C layer over the Aaardvark C layer, which dynamically loads a dll and makes calls into it. The thin layer just makes the use of c2hs eaiser. It is always possible there is some kind of memory issue, but there is no pattern to the mishap. It is random. Adding delays just changes probabilities of bad status. I made a C version of my application calling the same custom C layer, and there are no problems. This sort of indicates the problem is with the FFI. Because the failures are not general in that they target one particular value, and seem to be affected by time, it makes me wonder if there is some subtle Haskell run time issue. Like, could the garbage collector be interacting with things? Does anyone have an idea what kind of things to look for? Mike DLL loader static void *_loadFunction (const char *name, int *result) { static DLL_HANDLE handle = 0; void * function = 0; /* Load the shared library if necessary */ if (handle == 0) { u32 (*version) (void); u16 sw_version; u16 api_version_req; _setSearchPath(); handle = dlopen(SO_NAME, RTLD_LAZY); if (handle == 0) { #if API_DEBUG fprintf(stderr, "Unable to load %s\n", SO_NAME); fprintf(stderr, "%s\n", dlerror()); #endif *result = API_UNABLE_TO_LOAD_LIBRARY; return 0; } version = (void *)dlsym(handle, "c_version"); if (version == 0) { #if API_DEBUG fprintf(stderr, "Unable to bind c_version() in %s\n", SO_NAME); fprintf(stderr, "%s\n", dlerror()); #endif handle = 0; *result = API_INCOMPATIBLE_LIBRARY; return 0; } sw_version = (u16)((version() >> 0) & 0xffff); api_version_req = (u16)((version() >> 16) & 0xffff); if (sw_version < API_REQ_SW_VERSION || API_HEADER_VERSION < api_version_req) { #if API_DEBUG fprintf(stderr, "\nIncompatible versions:\n"); fprintf(stderr, " Header version = v%d.%02d ", (API_HEADER_VERSION >> 8) & 0xff, API_HEADER_VERSION & 0xff); if (sw_version < API_REQ_SW_VERSION) fprintf(stderr, "(requires library >= %d.%02d)\n", (API_REQ_SW_VERSION >> 8) & 0xff, API_REQ_SW_VERSION & 0xff); else fprintf(stderr, "(library version OK)\n"); fprintf(stderr, " Library version = v%d.%02d ", (sw_version >> 8) & 0xff, (sw_version >> 0) & 0xff); if (API_HEADER_VERSION < api_version_req) fprintf(stderr, "(requires header >= %d.%02d)\n", (api_version_req >> 8) & 0xff, (api_version_req >> 0) & 0xff); else fprintf(stderr, "(header version OK)\n"); #endif handle = 0; *result = API_INCOMPATIBLE_LIBRARY; return 0; } } /* Bind the requested function in the shared library */ function = (void *)dlsym(handle, name); *result = function ? API_OK : API_UNABLE_TO_LOAD_FUNCTION; return function; }

...
Because the failures are not general in that they target one particular value, and seem to be affected by time, it makes me wonder if there is some subtle Haskell run time issue. Like, could the garbage collector be interacting with things?
Does anyone have an idea what kind of things to look for?
Sure - not that I have worked out in any detail how this would do what you're seeing, but it's easy to do and often enough works. Compile with RTS options enabled and invoke with RTS option -V0. That will disable the runtime internal timer, which uses signals. The flood of signals from this source can interrupt functions that aren't really designed to deal with that, because in a more normal context they don't have to. Donn

Donn,
Thanks, this solved the problem.
I would like to know more about what the signals are doing, and what am I giving up by disabling them?
My hope is I can then go back to the dll expert and ask why this is causing their library a problem and try to see if they can solve the problem from their end, etc.
Mike
On Aug 12, 2014, at 11:04 PM, Donn Cave
...
Because the failures are not general in that they target one particular value, and seem to be affected by time, it makes me wonder if there is some subtle Haskell run time issue. Like, could the garbage collector be interacting with things?
Does anyone have an idea what kind of things to look for?
Sure - not that I have worked out in any detail how this would do what you're seeing, but it's easy to do and often enough works.
Compile with RTS options enabled and invoke with RTS option -V0.
That will disable the runtime internal timer, which uses signals. The flood of signals from this source can interrupt functions that aren't really designed to deal with that, because in a more normal context they don't have to.
Donn

[ ... re -V0 ]
Thanks, this solved the problem.
I would like to know more about what the signals are doing, and what am I giving up by disabling them?
My hope is I can then go back to the dll expert and ask why this is causing their library a problem and try to see if they can solve the problem from their end, etc.
I'm disgracefully ignorant about that. When I've been forced to run this way, it doesn't seem to do any very obvious immediate harm to the application at all, but I could be missing long term effects. The problem with the library might be easy to fix, and in principle it's sure worth looking into - while the GHC runtime delivers signals on an exceptionally massive scale, there are plenty of normal UNIX applications that use signals, maybe timers just like this for example, and it's easy to set up a similar test environment using setitimer(2) to provide the signal bombardment. (I believe GHC actually uses SIGVTALRM rather than SIGALRM, but don't think it will make any difference.) But realistically, in the end sometimes we can't get a fix for it, so it's interesting to know how -V0 works out as a work-around. I hope you will keep us posted. Donn

Donn, I was able to duplicate my problem in C using SIGVTALRM. Can someone explain the impact of using -V0 ? What does it do to performance, etc? Mike Sent from my iPad
On Aug 13, 2014, at 9:56 AM, Donn Cave
wrote: [ ... re -V0 ]
Thanks, this solved the problem.
I would like to know more about what the signals are doing, and what am I giving up by disabling them?
My hope is I can then go back to the dll expert and ask why this is causing their library a problem and try to see if they can solve the problem from their end, etc.
I'm disgracefully ignorant about that. When I've been forced to run this way, it doesn't seem to do any very obvious immediate harm to the application at all, but I could be missing long term effects.
The problem with the library might be easy to fix, and in principle it's sure worth looking into - while the GHC runtime delivers signals on an exceptionally massive scale, there are plenty of normal UNIX applications that use signals, maybe timers just like this for example, and it's easy to set up a similar test environment using setitimer(2) to provide the signal bombardment. (I believe GHC actually uses SIGVTALRM rather than SIGALRM, but don't think it will make any difference.)
But realistically, in the end sometimes we can't get a fix for it, so it's interesting to know how -V0 works out as a work-around. I hope you will keep us posted.
Donn
participants (2)
-
Donn Cave
-
Michael Jones