
#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: 14677 | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- (''N.B.'' I am writing this up from memory, and cannot verify it just now, maybe someone can lend a hand, otherwise I'll do it ASAP!) Here is a way how untagged pointers to strict data can be created in banged (strict) constructor fields. This reproduction recipe **depends on the patch from #14677 applied**. We have 3 modules `A`, `B` and `C`: {{{#!hs module A where data A = X | Y | Z a = Z }}} {{{#!hs module B where import A newtype B = B A b = B a }}} {{{#!hs {-# language MagicHash #-} module C where import A import B import GHC.Exts data C = C !B c = C b main = do print (I# (reallyUnsafePtrEquality# a (coerce b))) -- prints 0, b is softlink print (I# (dataToTag# c)) -- prints 0: not entered yet print (case c of C b' -> I# (dataToTag# b')) -- prints 0? print (case c of C (B a') -> I# (dataToTag# a')) -- prints 3 }}} ------------------- == Why this happens `B.b` is a newtype to `A.a` so one would expect that both alias the same memory location (a ''hardlink'' in filesystem parlance). But currently reexports are implemented with a special type of closure `IND_STATIC` (a ''softlink'') which needs to be entered to obtain the actual (tagged pointer). The `IND_STATIC` closure's pointer is never tagged (otherwise it would never be entered, instead interpreted as a honest-to-goodness `A.A`, which causes the symptoms seen in #14677). With #14677 applied to GHC, the unfolding of `B.b` is correctly read when compiling `C` (with `-O1` and better) and thus the compiler knows that it should be a tagged pointer value. Thus the construction of `C.c` shortcuts the entering of `B.b` when filling the strict field, and (because `B.b` being a softlink, thus untagged) the field ends up carrying a 0 tag. ------------------------- == How can this be fixed? I see two possibilities one conservative and one invasive. === Conservative When seeing a coercion unfolding of a tagged value being used to initialise a strict field, do not skip the evaluatedness check, but cater for the possibility of an `IND_STATIC` closure. Check the closure type, and if confirmed, steal the pointee and use that. === Invasive Get rid of the `IND_STATIC` closures altogether. For ''intra-module'' softlinks we can have proper hardlinks (assembler `.equiv` directives, or LLVM `alias`es). ''Inter-module'' softlinks can also be eliminated by linker scripts. This would however cause more build artifacts, so I don't know how hairy it would turn out. OTOH, it would reduce binary size by eliminating indirection closures and potential dereferencing code. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler