Curious Windows GHCi linker behaviour .o vs. .dll

Hello *, I assume this is a well known issue to MSYS2/Windows developers, so I hope somebody may be able to provide more insight for me to better understand the underlying problem of https://github.com/haskell/time/issues/2 So the prototype for tzset() is simply void tzset(void); and it's defined in `msvcrt.dll` as far as I can tell; Consider the following trivial program: module Main where foreign import ccall unsafe "time.h tzset" c_tzset :: IO () main :: IO() main = c_tzset When compiled with GHC 7.8.3, the resulting executable works and has the following tzset-symbols: $ nm tz.o | grep tzset U tzset $ nm tz.exe | grep tzset 000000000050e408 I __imp_tzset 00000000004afc40 T tzset However, when loaded into GHCi, the RTS linker fails to find `tzset`: $ ghci tz.hs WARNING: GHCi invoked via 'ghci.exe' in *nix-like shells (cygwin-bash, in particular) doesn't handle Ctrl-C well; use the 'ghcii.sh' shell wrapper instead GHCi, version 7.8.3: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. [1 of 1] Compiling Main ( tz.hs, interpreted ) ByteCodeLink: can't find label During interactive linking, GHCi couldn't find the following symbol: tzset ... However, when I prefix a `_` to the symbol-name in the FFI import, i.e. foreign import ccall unsafe "time.h tzset" c_tzset :: IO () Now, GHCi happily loads the module and is apparently able to resolve the `tzset` symbol: $ ghci tz.hs WARNING: GHCi invoked via 'ghci.exe' in *nix-like shells (cygwin-bash, in particular) doesn't handle Ctrl-C well; use the 'ghcii.sh' shell wrapper instead GHCi, version 7.8.3: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. [1 of 1] Compiling Main ( tz.hs, interpreted ) Ok, modules loaded: Main. *Main> Moreover, compiling and running the program still works, and the additional underscore is visible in `nm` as well: $ nm tz.o | grep tzset U _tzset $ nm tz.exe | grep tzset 000000000050e558 I __imp__tzset 00000000004b8050 T _tzset What's going on here? Why does one need to add an artificial underscore to FFI imported symbols for GHCi to resolve symbols? Is this a bug? Cheers, hvr

On Sat, Oct 11, 2014 at 9:24 AM, Herbert Valerio Riedel
Moreover, compiling and running the program still works, and the additional underscore is visible in `nm` as well:
Sounds like ghci's linker doesn't resolve weak symbols? -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On 10/11/2014 9:24 PM, Herbert Valerio Riedel wrote:
Consider the following trivial program:
module Main where
foreign import ccall unsafe "time.h tzset" c_tzset :: IO ()
main :: IO() main = c_tzset
[...]
However, when loaded into GHCi, the RTS linker fails to find `tzset`:
$ ghci tz.hs
[...]
ByteCodeLink: can't find label During interactive linking, GHCi couldn't find the following symbol: tzset
Strange, I tried it under HaskellPlatform-2014.2, it works, I didn't see the failure. And I tried it in both Windows cmd and msys2 shell.
However, when I prefix a `_` to the symbol-name in the FFI import, i.e.
foreign import ccall unsafe "time.h tzset" c_tzset :: IO ()
I guess it should read: foreign import ccall unsafe "time.h _tzset" c_tzset :: IO () It works too. Actually both _tzset and tzset exist in include/time.h, only tzset is old style name. They will be linked as the same function __imp__tzset. -- cg

On 2014-10-11 at 17:04:57 +0200, cg wrote: [...]
[...]
ByteCodeLink: can't find label During interactive linking, GHCi couldn't find the following symbol: tzset
Strange, I tried it under HaskellPlatform-2014.2, it works, I didn't see the failure. And I tried it in both Windows cmd and msys2 shell.
Well, I basically used a MSYS2 environment setup according to https://ghc.haskell.org/trac/ghc/wiki/Building/Preparation/Windows
However, when I prefix a `_` to the symbol-name in the FFI import, i.e.
foreign import ccall unsafe "time.h tzset" c_tzset :: IO ()
I guess it should read: foreign import ccall unsafe "time.h _tzset" c_tzset :: IO ()
It works too.
Yes, sorry, I forgot to add that leading underscore :-/
Actually both _tzset and tzset exist in include/time.h, only tzset is old style name. They will be linked as the same function __imp__tzset.
What do you mean by "old style"? And more importantly, what foreign-import line shall be used that works both on Windows and non-Windows platforms, compiled as well as interpreted in GHCi? Note also that I reduced the original problem to a much smaller repro-case here, the time-library actually has an additional redirection: The `tzset()` call is made inside a C function in `cbits/HsTime.c` which in turn is then foreign-imported. So in this case, the GHCi linker fails to resolve the correctly referenced `tzset()`. To me this sounds more and more like a serious bug in GHCi's linker. PS: If I run ./validate on GHC HEAD, several of the GHCi testcases such as ghci/prog001 prog001 [bad stderr] (ghci) ghci/prog002 prog002 [bad stderr] (ghci) ghci/prog003 prog003 [bad stderr] (ghci) ghci/prog012 prog012 [bad stderr] (ghci) ghci/prog013 prog013 [bad stderr] (ghci) fail for me due to not being able to load the `time` package (due to tzset). Cheers, hvr

On 10/11/2014 11:44 PM, Herbert Valerio Riedel wrote:
Well, I basically used a MSYS2 environment setup according to https://ghc.haskell.org/trac/ghc/wiki/Building/Preparation/Windows
I reproduced the issue with ghc-7.8.3-x86_64. Are you using 64-bit ghc? If so, it looks the issue is 64-bit only.
Actually both _tzset and tzset exist in include/time.h, only tzset is old style name. They will be linked as the same function __imp__tzset.
What do you mean by "old style"? And more importantly, what foreign-import line shall be used that works both on Windows and non-Windows platforms, compiled as well as interpreted in GHCi?
I meant OLDNAME in MS's jargon, because they deprecate tzset [1], then call it 'old'. But it it still usable. [1] http://msdn.microsoft.com/en-us/library/ms235451.aspx -- cg

Hello! On 2014-10-12 at 04:30:13 +0200, cg wrote: [...]
Are you using 64-bit ghc? If so, it looks the issue is 64-bit only.
Indeed, I have only set up 64bit CygWin & MSYS2 environments so far.
Actually both _tzset and tzset exist in include/time.h, only tzset is old style name. They will be linked as the same function __imp__tzset.
What do you mean by "old style"? And more importantly, what foreign-import line shall be used that works both on Windows and non-Windows platforms, compiled as well as interpreted in GHCi?
I meant OLDNAME in MS's jargon, because they deprecate tzset [1], then call it 'old'. But it it still usable.
Ok, thanks for clairification, so I see there are actually two entangled issues here: 1) When coding directly against the MSVCRT, one is supposed to use the underscore-prefixed POSIX symbols, like e.g `_tzset()`. However, when targetting CygWin, using the proper `tzset()` POSIX name is the recommended course of action. To this end, I've submitted https://github.com/haskell/time/pull/4 (I hope that works for 32bit MSYS2 environments as well) Personally, I think this was a very questionable decision on Microsoft's part, as this way you effectively destroy any chance to simply compile existing POSIX-compatible source code for no good reason... 2) The other issue seems to be that while linking a package using `tzset()` into a `.exe`, `tzset()` gets resolved just fine, however as soon as GHCi's linker is used to resolve `tzset()` contained in that package, it fails. At this point, I still consider this a bug. It was suggested by Brandon, that GHCi's linker fails to resolve weak symbols.

On Sun, Oct 12, 2014 at 6:11 AM, Herbert Valerio Riedel
Personally, I think this was a very questionable decision on Microsoft's part, as this way you effectively destroy any chance to simply compile existing POSIX-compatible source code for no good reason...
POSIX doesn't specify asm or linker level symbols, only C API. Most Unix-like platforms have an underscore on the front of symbol names at link level, so that the API doesn't have to avoid random platform-specific register names or the assembler need to have magic prefixes on either symbols or register names. So in fact, by adding the prefix underscore they are *more* compatible with Unix linkage, and presumably the FFI for Windows needs to start adding it the way the one for Unix does. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
participants (3)
-
Brandon Allbery
-
cg
-
Herbert Valerio Riedel