[GHC] #15155: How untagged pointers sneak into banged fields

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: 14677 | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- (''N.B.'' I am writing this up from memory, and cannot verify it just now, maybe someone can lend a hand, otherwise I'll do it ASAP!) Here is a way how untagged pointers to strict data can be created in banged (strict) constructor fields. This reproduction recipe **depends on the patch from #14677 applied**. We have 3 modules `A`, `B` and `C`: {{{#!hs module A where data A = X | Y | Z a = Z }}} {{{#!hs module B where import A newtype B = B A b = B a }}} {{{#!hs {-# language MagicHash #-} module C where import A import B import GHC.Exts data C = C !B c = C b main = do print (I# (reallyUnsafePtrEquality# a (coerce b))) -- prints 0, b is softlink print (I# (dataToTag# c)) -- prints 0: not entered yet print (case c of C b' -> I# (dataToTag# b')) -- prints 0? print (case c of C (B a') -> I# (dataToTag# a')) -- prints 3 }}} ------------------- == Why this happens `B.b` is a newtype to `A.a` so one would expect that both alias the same memory location (a ''hardlink'' in filesystem parlance). But currently reexports are implemented with a special type of closure `IND_STATIC` (a ''softlink'') which needs to be entered to obtain the actual (tagged pointer). The `IND_STATIC` closure's pointer is never tagged (otherwise it would never be entered, instead interpreted as a honest-to-goodness `A.A`, which causes the symptoms seen in #14677). With #14677 applied to GHC, the unfolding of `B.b` is correctly read when compiling `C` (with `-O1` and better) and thus the compiler knows that it should be a tagged pointer value. Thus the construction of `C.c` shortcuts the entering of `B.b` when filling the strict field, and (because `B.b` being a softlink, thus untagged) the field ends up carrying a 0 tag. ------------------------- == How can this be fixed? I see two possibilities one conservative and one invasive. === Conservative When seeing a coercion unfolding of a tagged value being used to initialise a strict field, do not skip the evaluatedness check, but cater for the possibility of an `IND_STATIC` closure. Check the closure type, and if confirmed, steal the pointee and use that. === Invasive Get rid of the `IND_STATIC` closures altogether. For ''intra-module'' softlinks we can have proper hardlinks (assembler `.equiv` directives, or LLVM `alias`es). ''Inter-module'' softlinks can also be eliminated by linker scripts. This would however cause more build artifacts, so I don't know how hairy it would turn out. OTOH, it would reduce binary size by eliminating indirection closures and potential dereferencing code. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I'm lost in a maze of twisty passages. We have * #14677 Code generator does not correctly tag a pointer. I'm not sure if this is a serious bug, or just the loss of a potential optimisation. (I think the latter.) * #14626 No need to enter a scrutinised value. Apparently blocked on #14677 * #14373 Introduce PTR-tagging for big constructor families. This has Phab:D4267. I think it may be blocked (in some way) on #15155. * #15155 (this ticket). I'm not sure how these tickets inter-relate beyond the blocking notes I've put here. Which is most urgent? Gabor, are you actively working on any of them? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by heisenbug): Replying to [comment:1 simonpj]:
I'm lost in a maze of twisty passages. We have
Sorry, but this was how the story unfolded with time. Believe me, I am also starting to get lost. Will try to clarify the relations, to my best knowledge, below.
* #14677 Code generator does not correctly tag a pointer. I'm not sure
if this is a serious bug, or just the loss of a potential optimisation. (I think the latter.) Performance-only bug. Leads to unnecessarily generated code and extra runtime because closures are entered redundantly.
* #14626 No need to enter a scrutinised value. Apparently blocked on
#14677 Does not seem to block #14373 but could make it more effective.
* #14373 Introduce PTR-tagging for big constructor families. This has
Phab:D4267. I think it may be blocked (in some way) on #15155. All the above is about making pointer tagging more expressive and reducing closure entries. The very bottom of this stack is * #13861 Take more advantage of STG representation invariance (follows up #9291)
* #15155 (this ticket).
I'm not sure how these tickets inter-relate beyond the blocking notes
I've put here. Which is most urgent? Gabor, are you actively working on any of them? I think #15155 (this ticket) is the one that needs to be resolved first, so that #14677 can land. Then #14626 and #14373 can be resolved, unless new problems surface. #13861 has a proof-of-concept implementation, which will need some love, but with extended PTR-tagging (#14373) it'd cover more cases. Since I dreamed up the "conservative" solution just last night, I have no fix for this one yet, but could start with an implementation in early June latest. (In January I implemented intra-module "hardlinks" in the NCG.) I think it would be nice to have a discussion about `IND_STATIC` closures too, resp. their elimination. Overall I think it is worrisome that strict fields get initialised with non-tagged pointers, even if the compiler statically knows that the corresponding data is in WHNF. TL/DR: This is a stack of mostly dependent tickets (newer ones need to be resolved before older ones), each unlocking a new optimisation in its own right. Hopefully no more monsters lurking. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): I didn't really follow the description here. Naively I'd expect `c` to compile into a CAF, like `c = case b of b' -> C b'` and then we'd be fine. If the compiler is assuming that `b` is already a value and thus avoiding evaluating it, that would be a false assumption. Where is the false assumption being made? I think the `IND_STATIC` stuff is a red herring. There's a top-level binding `b = a` which is compiled into an `IND_STATIC` as an optimisation, but it could also be compiled into a CAF, this is just a back-end code- generation choice. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: => CodeGen -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by heisenbug): While investigating I came across http://blog.omega-prime.co.uk/2011/07/06 /the-sad-state-of-symbol-aliases from Max Bolingbroke. Looks like linux does not support aliasing of an external symbol. That is sad. Need to re- check linker script definitions. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: heisenbug Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: 13027 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by heisenbug): * owner: (none) => heisenbug * related: => 13027 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: heisenbug Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: 13027 7308 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by heisenbug): * related: 13027 => 13027 7308 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: heisenbug Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: #13027 #7308 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by heisenbug): * related: 13027 7308 => #13027 #7308 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: heisenbug Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: #13027 #7308 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by heisenbug): While building ghc stage2 I've seen well over a thousand of these indirections being generated. I am working on a scheme which would allow putting all representationally equivalent (e.g. those wrapped by `cast`) bindings into an equivalence class and refer to the whole thing (by the non-casted representant) from the untyped stages (i.e. STG) on. With looming code-reuse techniques like ''GND'' (and I am looking at you `DerivingVia`) there will be much more coercing and casting going on, so I think this will become even more urgent in the future. Besides it would eliminate name table entries from `.so` files etc., speeding up (dynamic) linking a bit and saving even more space. Optionally the name canonicalisation could happen at the point where the assembly file is emitted. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: heisenbug Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: #13027 #7308 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Wow. I think we have discovered that it'll be really hard to ensure that a banged data constructor field always contains a correctly-tagged pointer to an evaluated object. See https://ghc.haskell.org/trac/ghc/ticket/15696#comment:36 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: heisenbug Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: #13027 #7308 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): heisenbug: are you still interested in this ticket? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: heisenbug Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: #13027 #7308 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by heisenbug): Replying to [comment:12 simonpj]:
heisenbug: are you still interested in this ticket?
Yes, I am totally interested that it doesn't get killed :-) Feel free to close as fixed-by, though. I ''think'' I have a plan to enforce following invariant: - An IND_STATIC closure should never point to a tagged constructor. For this one needs to track which local (module) names point to some other module's exported name, peeking through casts (i.e. representational equalities). This needs to go into a knot tying state in the backend (or perhaps STG) codegens. Should be mostly mechanical. Then the local references should statically dereference the local name and emit the other module's imported binding. Reading the assembly will be slightly harder, as some names will reflect the representational equalities explicitly. I am not sure that the above invariant is enough, considering the presence of BLACKHOLEs etc. But my gut feeling is that such evaluation-predicated closure kinds should never arise pointing to statically evaluated data. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: heisenbug Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: #13027 #7308 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonmar): * cc: simonmar (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: heisenbug Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: #13027 #7308 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): But see comment:11. I now think that it's a lost cause to insist that a strict constructor field is always a correctly-tagged pointer to an evaluated data value. In which case the whole IND_STATIC change becomes moot. I think we should work on the other tickets identified in commment:1. Are you interested in pursuing any of them? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: heisenbug Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: #13027 #7308 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by heisenbug): Replying to [comment:15 simonpj]:
...
I think we should work on the other tickets identified in commment:1. Are you interested in pursuing any of them?
I am at Haskell eXchange in 2 days, so we can discuss this at length there ;-) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: heisenbug Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: #13027 #7308 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): Just to write down what @simonpj and I discussed earlier today: it *would* be possible to ensure that a strict field of a constructor always points to a tagged value, by employing the same method we now use for `dataToTag#` (see Phab:D5201), namely have the code generator inject an eval for each strict field when building the constructor. The eval could be omitted in cases where the codegen can see that the value for the field is already a tagged value, these cases could include: * it's a constructor we just built * it's a case binder * the value was read from another strict constructor However, the extra eval would likely remain in many cases. But evals are cheap - just a tag test in the common case. Whether this ends up being a win or not is hard to say, it would be a good experiment to do though. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: heisenbug Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: #13027 #7308 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I agree with comment:17. But we also agreed that our short term plan is * Do ''not'' require the invariant that every strict constructor field must be a correctly tagged pointer to a data value. Fewer invariants means less to think about - and this is a delicate one! Moreover, it's not clear whether efficiency would be better or worse with it. But yes, it'd be good to try it out. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15155: How untagged pointers sneak into banged fields -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: heisenbug Type: bug | Status: new Priority: normal | Milestone: 8.10.1 Component: Compiler | Version: 8.4.2 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 14677 Related Tickets: #13027 #7308 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by heisenbug): I have a merge request: https://gitlab.haskell.org/ghc/ghc/merge_requests/10 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15155#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC