[GHC] #12935: Object code produced by GHC is non-deterministic

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- This is a continuation of #4012, which made interface files deterministic. Now someone needs to make object code deterministic. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * version: 8.0.1 => 8.0.2 @@ -1,2 +1,3 @@ - This is a continuation of #4012, which made interface files deterministic. - Now someone needs to make object code deterministic. + This is a continuation of #4012, which was resolved in 8.0.2 by making + interface files deterministic. Now someone needs to make object code + deterministic. New description: This is a continuation of #4012, which was resolved in 8.0.2 by making interface files deterministic. Now someone needs to make object code deterministic. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nomeata): * cc: nomeata (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by jb55): * cc: jb55 (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by ravi_n): What's involved in fixing this? Is it just plugging away at a list of known sources of non-determinism or is it more complicated than that? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): It's not an entirely trivial task. We use uniques quite often to name symbols in code produced by the native codegen. Fixing this without compromising compiler performance is rather tricky. I don't know if niteria has thought much about how to approach this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nh2): * Attachment "objdump-headers-1.txt" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nh2): * Attachment "objdump-headers-1.2.txt" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nh2): * Attachment "objdump-headers-2.txt" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): I have attached some files, `objdump-headers-1.txt` and `objdump- headers-1.txt` which are the outputs of `objdump --all-headers Example.o` as produced by `ghc --make -j1` of a project of mine. Have a look at them in a diff viewer. At the top you will find probably what was talked about before: Uniques making it into symbol names, like `cZoQ_info` vs `cQJU_info`. But further down you see {{{ 0000000000002f10 R_X86_64_PC32 .data.rel.ro+0x00000000000000e8 }}} vs {{{ 0000000000002f10 R_X86_64_PC32 .data.rel.ro+0x00000000000000f0 }}} which is the result of some values in the `RELOCATION RECORDS FOR [.data.rel.ro]` having their positions swapped: {{{ 0000000000000100 R_X86_64_64 monozmtraversablezm1zi0zi2zi1zm4aNtRX1lKHJF1rNRp4LmtV_DataziMonoTraversable_zdfMonoFoldableZMZN_closure 0000000000000108 R_X86_64_64 monozmtraversablezm1zi0zi2zi1zm4aNtRX1lKHJF1rNRp4LmtV_DataziMonoTraversableziUnprefixed_any_closure }}} vs {{{ 0000000000000100 R_X86_64_64 monozmtraversablezm1zi0zi2zi1zm4aNtRX1lKHJF1rNRp4LmtV_DataziMonoTraversableziUnprefixed_any_closure 0000000000000108 R_X86_64_64 monozmtraversablezm1zi0zi2zi1zm4aNtRX1lKHJF1rNRp4LmtV_DataziMonoTraversable_zdfMonoFoldableZMZN_closure }}} These are the same things, they have exactly the same symbol names, just their order swapped. How might this happen? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nh2): * cc: nh2 (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2):
It's not an entirely trivial task. We use uniques quite often to name symbols in code produced by the native codegen. Fixing this without compromising compiler performance is rather tricky.
@bgamari Question: Why cannot the uniques be predetermined by a seed that's dependent on the module name? My understanding so far is this: We need an infinite stream of unused strings to generate symbols. Parallel compilation with `-j` and "partial" compilation of the projects (resumes, incremental compilation), means some number or pseudorandom generator has a different initial state when we come by the file next time. So couldn't we initialise the seed of that pseudorandom generator as `hash(module name)`, so that each time a module is compiled, it always generates the same uniques in the same order (as within a module there is no parallel compilation)? And when a module name changes, we have to recompile anyway and it's probably OK if the object code changes when that happens. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nh2): * cc: niteria (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: 4012 | Blocking: Related Tickets: #12262 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nh2): * blockedby: => 4012 * related: => #12262 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * related: #12262 => * blockedby: 4012 => Comment:
So couldn't we initialise the seed of that pseudorandom generator as hash(module name), so that each time a module is compiled, it always generates the same uniques in the same order
Yes, I believe we could. The real issue is that (I suspect) this will be a rather major surgery of the backend. We produce symbol names and labels in many places and often we only have access to a unique supply. Threading the necessary state around will likely be a non-trivial exercise. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by niteria): @nh: What about collisions? Uniques are 64 bits on 64 bit systems. 8 bits are taken for the tag, that leaves us with 56 bits. You can expect collisions after `2^26` uniques are allocated. That's `67108864`. It's a pretty small number. We've definitely seen cases of running out of 24 bits for larger projects. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): @niteria: First, I think there's a misunderstanding here. Hashes would be used only for hashing the module name, to determine another N-bit prefix similiar as for the tag (for example, you could have 8 bits for the tag, 28 bits for the hash of the module name, and 28 bits for the actual uniques). Within this sub-namespace (per-module) uniques would continue to be allocated in a (+1) fashion, counting up from 0 to `2^26-1`. So there would never be the risk of a "uniques collision" as you describe, only the risk of a "module name hash colision". Second, to avoid module name hashes colliding: You just check them for uniqueness. This is easy, because you can do it ahead of time, before the build, where you already know all modules in the project. If they collide, you just use the next bigger integer. The point of that module name hashing (under the assumption that we have only 64 bits) is not to guarantee separate namespaces in all cases , just to make it very likely while also being stable against adding a new module to the project (you could also just enumerate the modules from 1 to N, but then if you add one in the middle, all modules get new subnamespaces assigned, and with name hashing that doesn't happen this way). If there's a collision, that's not the end of the world or a critical error, it just means you'll get different uniques next time (relatively unlikely with `2^28`). And you still have the property that given the same project with same modules and same compiler, you're guaranteed to be fully deterministic. Just not between changes that add a module. If we want to guarantee to be fully deterministic for individual modules even across additions of modules, all uniques have to be prefixed with the full module name (which might also be a good option). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): I'm looking into doing this. First a remark: Instead of hashing for extra stability, we could also just use the increasing `[1 of 3] Compiling A` numbers for unique namespacing. But I noticed that the order of those is currently not stable: In a project with modules `A, B, C, Main`, `A` imports nothing, `B` and `C` both import `A`, `Main` imports `C` and `D`, sometimes you will get a different output of which of these modules is `[3 of 4]`. Example: {{{ % touch B.hs && /raid/src/ghc/git-8.2.1/inplace/bin/ghc-stage2 --make Main.hs [3 of 4] Compiling B ... }}} {{{ % touch C.hs && /raid/src/ghc/git-8.2.1/inplace/bin/ghc-stage2 --make Main.hs [3 of 4] Compiling C ... }}} That is because in `GhcMake.upsweep'` the `mod_index` is just counting up `+1`, but over `sccs :: [SCC ModSummary]` the order of which is somehow influenced by the file mtime (I haven't found yet how the mtime makes it to be a more significant field than the module name). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): OK, the reason for the ordering being dependent on mtime is #7231: `mg = stable_mg ++ unsable_mg` was implemented to fix `GHCi erroneously unloads modules after a failed :reload`. https://ghc.haskell.org/trac/ghc/ticket/7231#comment:2 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D4388 Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * differential: => Phab:D4388 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D4388 Wiki Page: | -------------------------------------+------------------------------------- Comment (by shlevy): D4388 fixes some of this, by removing non-determinism in cost centre symbol names (which caused linking errors when doing distributed builds with nix). A unique was replaced with a 0-based index that increments in source location order for a given module, cost centre name, and cost centre type. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12935: Object code produced by GHC is non-deterministic
-------------------------------------+-------------------------------------
Reporter: bgamari | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.0.2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D4388
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#12935: Object code produced by GHC is non-deterministic -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D4388 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): As I mentioned over on #14769, I don't think relying on the compilation order of modules for determinism is the right way to do this. The rationale is that the compilation order can change for other reasons: the user may invoke GHC on a different set of modules, without changing anything about the source code or dependencies of any particular module. The determinism property we want is that the output depends only on * The source code of a module, * Its dependencies, * compiler flags that affect the generated code -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12935#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC