`binary` serializing datatypes deriving Generic wrong on arm (32bit) with GHC HEAD

Hi! while trying to make sure cross compilation with Template Haskell works properly with 8.4, I ran into the following situation: When serializing data types, e.g. `Name OccName NameFlavour` in the transmission of Template Haskell Splice results from a 32bit arm device to the x86_64 host ghc. We expect to see: ``` .- 0 (first constructor) .- 1 .- 4 (fifth constructor) .- 4 .- 3 v v v v v \NUL \NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOHf \EOT \NUL \NUL\NUL\NUL\NUL\NUL\NUL\NUL\EOTmain \NUL\NUL\NUL\NUL\NUL\NUL\NUL\ETXTmp '--' '----------------------------------' '---------------------------------' NameSpace PkgName ModName '-------------------------------' '--------------------------------------------------------------------------------' OccName NameG NameSpace PkgName ModName :: NameFlavour '-----------------------------------------------------------------------------------------------------------------------' Name OccName NameFlavour :: Name ``` However, the `NameSpace` on the 32bit arm ends up being 8 bytes. Even though the full `Namespace` data type can be fully serialized in a single byte. The `binary` package tries to compute the size it needs for a generic data type, using the following logic (from binary/src/Data/Binary/Generic.hs): ``` class SumSize f where sumSize :: Tagged f Word64 newtype Tagged (s :: * -> *) b = Tagged {unTagged :: b} instance (SumSize a, SumSize b) => SumSize (a :+: b) where sumSize = Tagged $ unTagged (sumSize :: Tagged a Word64) + unTagged (sumSize :: Tagged b Word64) instance SumSize (C1 c a) where sumSize = Tagged 1 ``` Thus for a simple sum type `data X = A | B` we should get a `sumSize` of 2. The arm32 device however ends up getting 2^33, because `sumSize :: Tagged a Word64` and `sumSize :: Tagged b Word64` each end up being 2^32. With some help from the nice folks in #ghc, I was able to conjure up the following condensed test case: ``` {-# LANGUAGE DeriveGeneric, KindSignatures, PolyKinds, CPP, ScopedTypeVariables, TypeOperators, TypeSynonymInstances, FlexibleInstances #-} {-# OPTIONS_GHC -O2 #-} import GHC.Generics import Data.Word import Debug.Trace data X = A | B deriving (Show, Generic) main :: IO () main = print (sumSize :: Tagged (Rep X)) -- like traceShowId, but allows us to prepend a message. t :: Show a => String -> a -> a #if TRACE t msg x = traceShow (msg ++ show x) x #else t _ = id #endif class SumSize f where sumSize :: Tagged f newtype Tagged (s :: * -> *) = Tagged {unTagged :: WORD} deriving Show instance (SumSize a, SumSize b) => SumSize (a :+: b) where sumSize = t "SumSize (a :+: b): " $ Tagged $ unTagged (t "a :+: b => sumSize :: Tagged a: " $ sumSize :: Tagged a) + unTagged (t "a :+: b => sumSize :: Tagged b: " $ sumSize :: Tagged b) instance SumSize (C1 c a) where sumSize = t "SumSize (C1 c a): " $ Tagged 1 instance SumSize a => SumSize (M1 D c a) where sumSize = t "SumSize (M1 D c a): " $ Tagged . unTagged $ (sumSize :: Tagged a) ``` compiling this with `-DWORD=Word32 -DTRACE=1` yields the correct result (=2), with `-DWORD=Word64 -DTRACE=0` as well. With `-DWORD=Word64 -DTRACE=2` the wrong result (=2^33) Optimization flags seem not to play any role when everything is in a single module (as the test case). As such I have attached the `-ddump-simple -dsuppress-all` files for the Word64 and Word32 with TRACE=1, as well as the diff between the Word32 and Word64 dump. The output with WORD=Word32, TRACE=1 is: ``` "SumSize (C1 c a): Tagged {unTagged = 1}" "a :+: b => sumSize :: Tagged a: Tagged {unTagged = 4294967296}" "a :+: b => sumSize :: Tagged b: Tagged {unTagged = 4294967296}" "SumSize (a :+: b): Tagged {unTagged = 8589934592}" "SumSize (M1 D c a): Tagged {unTagged = 8589934592}" Tagged {unTagged = 8589934592} ``` with WORD=Word64, TRACE=1 is: ``` "SumSize (C1 c a): Tagged {unTagged = 1}" "a :+: b => sumSize :: Tagged a: Tagged {unTagged = 1}" "a :+: b => sumSize :: Tagged b: Tagged {unTagged = 1}" "SumSize (a :+: b): Tagged {unTagged = 2}" "SumSize (M1 D c a): Tagged {unTagged = 2}" Tagged {unTagged = 2} ``` Any help with this would be greatly appreciated! Cheers, Moritz PS: I'm not absolutely sure, but this might also be related to https://ghc.haskell.org/trac/ghc/ticket/13513

Given the following sample program: ``` import Data.Word f :: a -> a f x = x {-# NOINLINE f #-} x, y, x', y' :: Word64 x = 1 y = 1 x' = 4294967296 + 2 -- upper:1 + lower:2 y' = 4294967296 + 2 -- upper:1 + lower:2 main :: IO () main = let z = f x + f y z' = f x' + f y' in do { print (z == x + y) ; print z ; print (z' == x' + y') ; print z' } ``` This produces: ``` True 2 True 8589934596 ``` when compiled with `-O0` for me. and ``` False 8589934592 True 8589934596 ``` when compiled with `-O1` for me. Thus, if we start out with two Word64 that fit into the lower byte, we end up with the sum of both in the upper byte (with `-O1`). The difference between -O0 and -O1 is that -O1 goes through the primOps, and as such we end up with code like: ``` %26 = tail call i64 @hs_word64ToInt64(i64 2) %27 = tail call i64 @hs_word64ToInt64(i64 4294967296) %28 = tail call i64 @hs_plusInt64(i64 %27, i64 %26) %29 = tail call i64 @hs_int64ToWord64(i64 %28) ``` after which interestingly the result is correct. However the subsequent invocation of `@base_GHCziWord_W64zh_con_info`, seems to pick the wrong bytes for when reconstructing the Word64. If anyone got any idea, I'd be happy to know. Otherwise I guess I'd have to start adding debug information into the rts? Cheers, Moritz
On Nov 28, 2017, at 12:52 PM, Moritz Angermann
wrote: Hi!
while trying to make sure cross compilation with Template Haskell works properly with 8.4, I ran into the following situation:
When serializing data types, e.g. `Name OccName NameFlavour` in the transmission of Template Haskell Splice results from a 32bit arm device to the x86_64 host ghc.
We expect to see: ``` .- 0 (first constructor) .- 1 .- 4 (fifth constructor) .- 4 .- 3 v v v v v \NUL \NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOHf \EOT \NUL \NUL\NUL\NUL\NUL\NUL\NUL\NUL\EOTmain \NUL\NUL\NUL\NUL\NUL\NUL\NUL\ETXTmp '--' '----------------------------------' '---------------------------------' NameSpace PkgName ModName '-------------------------------' '--------------------------------------------------------------------------------' OccName NameG NameSpace PkgName ModName :: NameFlavour '-----------------------------------------------------------------------------------------------------------------------' Name OccName NameFlavour :: Name ```
However, the `NameSpace` on the 32bit arm ends up being 8 bytes. Even though the full `Namespace` data type can be fully serialized in a single byte.
The `binary` package tries to compute the size it needs for a generic data type, using the following logic (from binary/src/Data/Binary/Generic.hs):
``` class SumSize f where sumSize :: Tagged f Word64
newtype Tagged (s :: * -> *) b = Tagged {unTagged :: b}
instance (SumSize a, SumSize b) => SumSize (a :+: b) where
sumSize = Tagged $ unTagged (sumSize :: Tagged a Word64) + unTagged (sumSize :: Tagged b Word64)
instance SumSize (C1 c a) where sumSize = Tagged 1 ```
Thus for a simple sum type `data X = A | B` we should get a `sumSize` of 2. The arm32 device however ends up getting 2^33, because `sumSize :: Tagged a Word64` and `sumSize :: Tagged b Word64` each end up being 2^32.
With some help from the nice folks in #ghc, I was able to conjure up the following condensed test case:
``` {-# LANGUAGE DeriveGeneric, KindSignatures, PolyKinds, CPP, ScopedTypeVariables, TypeOperators, TypeSynonymInstances, FlexibleInstances #-} {-# OPTIONS_GHC -O2 #-}
import GHC.Generics
import Data.Word import Debug.Trace
data X = A | B deriving (Show, Generic)
main :: IO () main = print (sumSize :: Tagged (Rep X))
-- like traceShowId, but allows us to prepend a message. t :: Show a => String -> a -> a #if TRACE t msg x = traceShow (msg ++ show x) x #else t _ = id #endif
class SumSize f where sumSize :: Tagged f
newtype Tagged (s :: * -> *) = Tagged {unTagged :: WORD} deriving Show
instance (SumSize a, SumSize b) => SumSize (a :+: b) where sumSize = t "SumSize (a :+: b): " $ Tagged $ unTagged (t "a :+: b => sumSize :: Tagged a: " $ sumSize :: Tagged a) + unTagged (t "a :+: b => sumSize :: Tagged b: " $ sumSize :: Tagged b)
instance SumSize (C1 c a) where sumSize = t "SumSize (C1 c a): " $ Tagged 1
instance SumSize a => SumSize (M1 D c a) where sumSize = t "SumSize (M1 D c a): " $ Tagged . unTagged $ (sumSize :: Tagged a)
```
compiling this with `-DWORD=Word32 -DTRACE=1` yields the correct result (=2), with `-DWORD=Word64 -DTRACE=0` as well. With `-DWORD=Word64 -DTRACE=2` the wrong result (=2^33)
Optimization flags seem not to play any role when everything is in a single module (as the test case).
As such I have attached the `-ddump-simple -dsuppress-all` files for the Word64 and Word32 with TRACE=1, as well as the diff between the Word32 and Word64 dump.
The output with WORD=Word32, TRACE=1 is: ``` "SumSize (C1 c a): Tagged {unTagged = 1}" "a :+: b => sumSize :: Tagged a: Tagged {unTagged = 4294967296}" "a :+: b => sumSize :: Tagged b: Tagged {unTagged = 4294967296}" "SumSize (a :+: b): Tagged {unTagged = 8589934592}" "SumSize (M1 D c a): Tagged {unTagged = 8589934592}" Tagged {unTagged = 8589934592} ```
with WORD=Word64, TRACE=1 is: ``` "SumSize (C1 c a): Tagged {unTagged = 1}" "a :+: b => sumSize :: Tagged a: Tagged {unTagged = 1}" "a :+: b => sumSize :: Tagged b: Tagged {unTagged = 1}" "SumSize (a :+: b): Tagged {unTagged = 2}" "SumSize (M1 D c a): Tagged {unTagged = 2}" Tagged {unTagged = 2} ```
Any help with this would be greatly appreciated!
Cheers, Moritz
PS: I'm not absolutely sure, but this might also be related to https://ghc.haskell.org/trac/ghc/ticket/13513
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
participants (1)
-
Moritz Angermann