Removal of #include <HsFFI.h> from template-hsc.h breaks largefile support on 32bit Linux

Hello all,
I am new here, but I want to report what I suspect may be a problem.
I ran into it while using some third-party package from hackage on a
32bit Linux with ghc 7.4.1. I discovered that off_t fields in the .hsc
files in the package where interpreted as 32bit words. I suspected that
64bit offsets should be used because even 32bit Linux has "largefile"
support with 64bit offsets.
I found that earlier versions of hsc2hs included HsFFI.h into the
generated C code, and HsFFI.h in turn indirectly includes ghcautoconf.h
which has
#define _FILE_OFFSET_BITS 64
in it. So, if I build the .hsc files like this:
hsc2hs -i HsFFI.h filename.hsc
then off_t is 64bit and 64bit file manipulation syscalls are used. I did
not check it but I think that earlier versions of hsc2hs where creating
largefile-aware version of the code by default, because HsFFI.h was
included in the code by default.
This is a simple test program:
==== checktypes.hsc ====
-- run like this: hsc2hs checktypes.hsc && runhaskell checktypes.hs
module Main where
#include

On 15/02/2012 12:31, Eugene Crosser wrote:
Hello all,
I am new here, but I want to report what I suspect may be a problem.
I ran into it while using some third-party package from hackage on a 32bit Linux with ghc 7.4.1. I discovered that off_t fields in the .hsc files in the package where interpreted as 32bit words. I suspected that 64bit offsets should be used because even 32bit Linux has "largefile" support with 64bit offsets.
I found that earlier versions of hsc2hs included HsFFI.h into the generated C code, and HsFFI.h in turn indirectly includes ghcautoconf.h which has
#define _FILE_OFFSET_BITS 64
in it. So, if I build the .hsc files like this:
hsc2hs -i HsFFI.h filename.hsc
then off_t is 64bit and 64bit file manipulation syscalls are used. I did not check it but I think that earlier versions of hsc2hs where creating largefile-aware version of the code by default, because HsFFI.h was included in the code by default.
This is a simple test program:
==== checktypes.hsc ==== -- run like this: hsc2hs checktypes.hsc&& runhaskell checktypes.hs module Main where #include
main = do putStrLn $ show (#size off_t) ======================== $ hsc2hs checktypes.hsc&& runhaskell checktypes.hs 4 $ hsc2hs -i HsFFI.h checktypes.hsc&& runhaskell checktypes.hs 8
As I understand, this situation means that while the ghc itself and haskell programs compiled by it are largefile-capable, any third party modules that contain .hsc files are not. If I am right, this is probably not a good thing.
Please can some guru take a look at this issue?
Guru at your service :-) We discovered this during the 7.4 cycle: http://hackage.haskell.org/trac/ghc/ticket/2897#comment:12 Packages that were relying on `HsFFI.h` to define `_FILE_OFFSET_BITS` should no longer do this, instead they should use an appropriate autoconf script or some other method. See the `unix` package for an example of how to do this. It was really a mistake that it worked before. Cheers, Simon
See also: http://www.haskell.org/pipermail/glasgow-haskell-users/2009-February/016606.... http://www.haskell.org/pipermail/cvs-ghc/2011-September/065848.html
Thanks,
Eugene
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Hello Simon, thanks for your attention :) On 02/16/2012 04:25 PM, Simon Marlow wrote:
I found that earlier versions of hsc2hs included HsFFI.h into the [...] As I understand, this situation means that while the ghc itself and haskell programs compiled by it are largefile-capable, any third party modules that contain .hsc files are not. If I am right, this is probably not a good thing.
We discovered this during the 7.4 cycle:
http://hackage.haskell.org/trac/ghc/ticket/2897#comment:12
Packages that were relying on `HsFFI.h` to define `_FILE_OFFSET_BITS` should no longer do this, instead they should use an appropriate autoconf script or some other method. See the `unix` package for an example of how to do this. It was really a mistake that it worked before.
But that means that the "C build environment" has to be constructed independently for each module (that needs it), and consequently is not guaranteed to match the compiler's environment. Would it be better (more consistent) to propagate GHC's (or other compiler's) environment by default, along the lines of the comment #16? To cite Duncan, "each Haskell implementation has its own C environment, and hsc2hs must use that same environment or it will produce incorrect results." Just a thought, and, as I said, I am not really qualified to argue... Eugene

I have similar issues to this in jhc due to its pervasive caching of compilation results. Basically I must keep track of any potentially ABI-changing flags and ensure they are consistently passed to every compilation unit and include them in the signature hash along with the file contents. I make sure to always pass said flags on the command line to all the tools, as in -D_FILE_OFFSET_BITS=64 gets passed to both gcc and hsc2hs rather than relying on a config file. John

On 16/02/2012 13:25, Eugene Crosser wrote:
Hello Simon, thanks for your attention :)
On 02/16/2012 04:25 PM, Simon Marlow wrote:
I found that earlier versions of hsc2hs included HsFFI.h into the [...] As I understand, this situation means that while the ghc itself and haskell programs compiled by it are largefile-capable, any third party modules that contain .hsc files are not. If I am right, this is probably not a good thing.
We discovered this during the 7.4 cycle:
http://hackage.haskell.org/trac/ghc/ticket/2897#comment:12
Packages that were relying on `HsFFI.h` to define `_FILE_OFFSET_BITS` should no longer do this, instead they should use an appropriate autoconf script or some other method. See the `unix` package for an example of how to do this. It was really a mistake that it worked before.
But that means that the "C build environment" has to be constructed independently for each module (that needs it), and consequently is not guaranteed to match the compiler's environment. Would it be better (more consistent) to propagate GHC's (or other compiler's) environment by default, along the lines of the comment #16? To cite Duncan, "each Haskell implementation has its own C environment, and hsc2hs must use that same environment or it will produce incorrect results."
Just a thought, and, as I said, I am not really qualified to argue...
Well, the question of whether to use 64-bit file offsets or not really has nothing to do with GHC itself. The choice is made in the base package and is only visible via the definition of Foreign.C.Types.COff and through the unix package. In fact, there's nothing stopping your own package from using 32-bit file offsets if you want to. The time you would want to be compatible is if you want to make your own FFI declarations that use Foreign.C.Types.COff. In that case you need to know that the base package is using _FILE_OFFSET_BITS=64 and do the same thing. Cheers, Simon

It isn't local to a file though because it changes the ABI, for instance
void foo(off_t *x);
it will blow up if called from a file with a differently sized off_t.
John
On Fri, Feb 17, 2012 at 4:23 AM, Simon Marlow
On 16/02/2012 13:25, Eugene Crosser wrote:
Hello Simon, thanks for your attention :)
On 02/16/2012 04:25 PM, Simon Marlow wrote:
I found that earlier versions of hsc2hs included HsFFI.h into the
[...]
As I understand, this situation means that while the ghc itself and haskell programs compiled by it are largefile-capable, any third party modules that contain .hsc files are not. If I am right, this is probably not a good thing.
We discovered this during the 7.4 cycle:
http://hackage.haskell.org/trac/ghc/ticket/2897#comment:12
Packages that were relying on `HsFFI.h` to define `_FILE_OFFSET_BITS` should no longer do this, instead they should use an appropriate autoconf script or some other method. See the `unix` package for an example of how to do this. It was really a mistake that it worked before.
But that means that the "C build environment" has to be constructed independently for each module (that needs it), and consequently is not guaranteed to match the compiler's environment. Would it be better (more consistent) to propagate GHC's (or other compiler's) environment by default, along the lines of the comment #16? To cite Duncan, "each Haskell implementation has its own C environment, and hsc2hs must use that same environment or it will produce incorrect results."
Just a thought, and, as I said, I am not really qualified to argue...
Well, the question of whether to use 64-bit file offsets or not really has nothing to do with GHC itself. The choice is made in the base package and is only visible via the definition of Foreign.C.Types.COff and through the unix package. In fact, there's nothing stopping your own package from using 32-bit file offsets if you want to.
The time you would want to be compatible is if you want to make your own FFI declarations that use Foreign.C.Types.COff. In that case you need to know that the base package is using _FILE_OFFSET_BITS=64 and do the same thing.
Cheers, Simon
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On 17/02/12 19:36, John Meacham wrote:
It isn't local to a file though because it changes the ABI, for instance
void foo(off_t *x);
it will blow up if called from a file with a differently sized off_t.
But we're talking about Haskell code here, not C code. There's no way for something to "blow up", the typechecker will catch any discrepancies. Perhaps I don't understand what problem you're thinking of - can you give more detail? Cheers, Simon
John
On Fri, Feb 17, 2012 at 4:23 AM, Simon Marlow
wrote: On 16/02/2012 13:25, Eugene Crosser wrote:
Hello Simon, thanks for your attention :)
On 02/16/2012 04:25 PM, Simon Marlow wrote:
I found that earlier versions of hsc2hs included HsFFI.h into the
[...]
As I understand, this situation means that while the ghc itself and haskell programs compiled by it are largefile-capable, any third party modules that contain .hsc files are not. If I am right, this is probably not a good thing.
We discovered this during the 7.4 cycle:
http://hackage.haskell.org/trac/ghc/ticket/2897#comment:12
Packages that were relying on `HsFFI.h` to define `_FILE_OFFSET_BITS` should no longer do this, instead they should use an appropriate autoconf script or some other method. See the `unix` package for an example of how to do this. It was really a mistake that it worked before.
But that means that the "C build environment" has to be constructed independently for each module (that needs it), and consequently is not guaranteed to match the compiler's environment. Would it be better (more consistent) to propagate GHC's (or other compiler's) environment by default, along the lines of the comment #16? To cite Duncan, "each Haskell implementation has its own C environment, and hsc2hs must use that same environment or it will produce incorrect results."
Just a thought, and, as I said, I am not really qualified to argue...
Well, the question of whether to use 64-bit file offsets or not really has nothing to do with GHC itself. The choice is made in the base package and is only visible via the definition of Foreign.C.Types.COff and through the unix package. In fact, there's nothing stopping your own package from using 32-bit file offsets if you want to.
The time you would want to be compatible is if you want to make your own FFI declarations that use Foreign.C.Types.COff. In that case you need to know that the base package is using _FILE_OFFSET_BITS=64 and do the same thing.
Cheers, Simon
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Fri, Feb 17, 2012 at 2:12 PM, Simon Marlow
On 17/02/12 19:36, John Meacham wrote:
It isn't local to a file though because it changes the ABI, for instance
void foo(off_t *x);
it will blow up if called from a file with a differently sized off_t.
But we're talking about Haskell code here, not C code. There's no way for something to "blow up", the typechecker will catch any discrepancies.
Perhaps I don't understand what problem you're thinking of - can you give more detail?
Someone writes a C function that returns an off_t * that is foreign imported by a haskell program using Ptr COff, the haskell program then writes to the output pointer with the COff Storable instance. However the imported function was compiled without 64 bit off_t's so it only allocated 32 bits to be written into so some other memory gets overwritten with garbage. 64 bit off_t's change the ABI of called C functions much like passing -m32 or -mrtd does so should be considered a all-or-nothing sort of thing. In particular, when ghc compiles C code, it should make sure it does it with the same ABI flags as the rest of the thing being compiled. John environment as the rest of the code.

I don't know enough to understand if the "hard" troubles described by John Meacham are real, but I think that even if they are not, the current situation violates the "principle of least surprise" for the author of a module. Such author may be unaware of the need to take special steps to make his module use the same file API as the builtin language functions, and that will lead to situations when the modules bundled with the compiler can deal with >4Gb files but the new module cannot. It may be even worse for instance with respect to reentrancy of the code: the author of the module may not realize that he needs to take special steps or e.g. the handling of errno will be broken in multithreaded environment. In extreme case, even the opposite situation is possible: the module's autoconf may detect some obscure incompatible feature of the system that the base build did not know about (even if this is a pure theoretical danger). I think it would be a right thing to provide the author of an external module with "baseline" C environment by default, compatible with the environment under which the modules bundled with the compiler where built. And on top of that, allow them to use autoconf/whatever else to deviate from that if they need to. Eugene

On Sat, Feb 18, 2012 at 15:15, Eugene Crosser
I think it would be a right thing to provide the author of an external module with "baseline" C environment by default, compatible with the environment under which the modules bundled with the compiler where built. And on top of that, allow them to use autoconf/whatever else to deviate from that if they need to.
Agreed. There's a reason that languages with lots of experience with extensions (e.g. Perl, Python) make all the details of the environment the runtime was built under available to extensions/native code modules; you might be running a binary distribution on a system that is compatible with but not natively the same as the runtime's build environment, so using autoconf-type stuff to determine the native environment will lead to the extension having an inefficient, or at worst subtly (or not so subtly, as with 32 vs. 64 bit issues) incompatible, link to the runtime. You *really* don't want the runtime to be marshaling 32-bit values based on how it was built, but the module using autoconf to determine that the native value size is 64 bits and treating the marshaled value as such, or vice versa. (In fact, just based on its name, I would have assumed that the point of HsFFI.h is to insure hsc2hs-ed (that is, FFI-using) modules get the types right, so making hsc2hs not use HsFFI.h makes little sense on its face.) -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

On 17/02/2012 22:51, John Meacham wrote:
On Fri, Feb 17, 2012 at 2:12 PM, Simon Marlow
wrote: On 17/02/12 19:36, John Meacham wrote:
It isn't local to a file though because it changes the ABI, for instance
void foo(off_t *x);
it will blow up if called from a file with a differently sized off_t.
But we're talking about Haskell code here, not C code. There's no way for something to "blow up", the typechecker will catch any discrepancies.
Perhaps I don't understand what problem you're thinking of - can you give more detail?
Someone writes a C function that returns an off_t * that is foreign imported by a haskell program using Ptr COff, the haskell program then writes to the output pointer with the COff Storable instance. However the imported function was compiled without 64 bit off_t's so it only allocated 32 bits to be written into so some other memory gets overwritten with garbage. 64 bit off_t's change the ABI of called C functions much like passing -m32 or -mrtd does so should be considered a all-or-nothing sort of thing. In particular, when ghc compiles C code, it should make sure it does it with the same ABI flags as the rest of the thing being compiled.
So I'm not sure I agree with this. If someone is writing some C code and some Haskell code that calls it, it is their responsibility to make sure they get it right. Furthermore, GHC should not be imposing any choices on end users - what if you need to call an API that uses the 32-bit off_t, for example? We've had all kinds of trouble in the past with GHC's configuration details leaking into user's code, which is why we try to be as clean as possible now. There are no ABI issues here - it's just a matter of the contract between the Haskell code and the C code, which is completely under the control of the user. The only place where there the choices we make in GHC affect the user's code is the definition of the COff type. We should make it clear in the documentation for COff which version of off_t it corresponds to (or maybe even have two versions), but IMO that's all we should do. Cheers, Simon
John environment as the rest of the code.
participants (4)
-
Brandon Allbery
-
Eugene Crosser
-
John Meacham
-
Simon Marlow