I spent some time hacking around on this from a library perspective when I had to interoperate with a bunch of Objective C on a 64-bit mac as many of the core library functions you need to FFI out to pass around pairs of
Int32s as a struct small enough by the x64 ABI to get shoehorned into one register, and as I was programmatically cloning Objective C APIs via template haskell I couldn't use the usual clunky C shims.
What I was doing was just using libffi with a lot of work to cache the results of ffi_prep_cif for each signature.
It worked reasonably well for my purposes, but my need for it vanished and I abandoned the code in the middle of refactoring it for grander things.
So if nothing else, you can at least take this as a vote of confidence that your idea isn't crazy. =)
I'd also be happy to answer questions if you get stuck or need help.
-Edward