Making compilation results deterministic (#4012)

Hello,
For the past couple of weeks I've been working on making compilation
results deterministic.
What I'm focusing on right now is the interface file determinism, I don't
care about binaries being deterministic.
I'd like to give a status update and ask for some advice, since I'm running
into issues that I don't have a good way of solving.
The first question anyone might ask is how did nondeterminism creep into
the compiler. If we're compiling with a single thread there's no reason for
the computation to proceed in non deterministic way. I'm fairly certain
that the issue originates from lazy loading of interface files. Relevant
function:
https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/typeche....
What happens is that if you already have an interface file for a target
you're trying to build the computation will proceed differently.
Why does lazy loading matter? As you load the interface file it needs to
get type-checked and that means it needs to pull some Uniques from a global
UniqSupply. It does that in different order resulting in different Unique
assignment. As far as I can tell, lazy loading is required for performance,
so I abandoned the idea of fixing it. I haven't looked at parallel
compilation yet, but I'd expect it to result in different Unique assignment
as well.
I believe up to this point we're ok. Uniques are non-deterministic, but it
shouldn't be a big deal. Uniques should be opaque enough to not affect the
order of computation, for example the order of binds considered. But they
aren't.
Uniques are used in different ways throughout the compiler and they end up
reordering things:
1) They have an `Ord` instance:
https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/basicTy...
.
So far the places it impacts the most are places that use
`stronglyConnCompFromEdgedVertices`, because Unique is used as a Node key
and the result depends on the order of Nodes being considered. Some
examples:
https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/simplCo...
https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/rename/...
(because Ord for Name uses Unique
https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/basicTy...
)
I've tried to see what removing it would entail and the changes would be
far reaching: https://phabricator.haskell.org/P62.
2) VarEnv, NameEnv are implemented in terms of UniqFM, which is just
Data.IntMap with keys being the Unique integer values.
The way this bites us is that when UniqFM's get converted to a list they
end up being sorted on Unique value. This problem is more widespread than
the `stronglyConnCompFromEdgedVertices` issue, there's even a place where
it's implicitly depended on:
https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/nativeG...
.
I've tried to fix it by making `toList` return the elements in the order of
insertion (https://phabricator.haskell.org/P63), but that turned out to
have significant cost. My unscientific benchmark on aeson and text showed
10% compilation time increase. It's possible it can be done in less
expensive way, I've tried a couple of approaches, but all of them resulted
in 10% time increase. I've also considered to split UniqFM to two different
types, one that keeps the ordering, and one that can't `toList`, but I
suspect that the cut will not be clean, so I haven't tried that.
In some cases we got away with ordering things by OccName where needed: (
https://phabricator.haskell.org/D1073, https://phabricator.haskell.org/D1192),
but OccName's don't have to be unique in every case and if we try to make
them unique that would make them longer and probably result in even greater
slowdown.
The instance I've recently looked at doesn't look like it can be solved by
sorting by OccName.
The code that triggers the problem (simplified from haskell-src-exts):
data Decl l = Boring l
deriving (Eq)
data Binds l
= BDecls l [Decl l] -- ^ An ordinary binding group
| IPBinds l [IPBind l] -- ^ A binding group for implicit parameters
deriving (Eq)
data IPBind l = Boring2 l
deriving (Eq)
The end result is:
4449fe3f8368a2c47b2499a1fb033b6a
$fEqBinds_$c==$Binds :: Eq l => Binds l -> Binds l -> Bool
{- Arity: 1, HasNoCafRefs, Strictness:

2015-09-14 19:04 GMT+02:00 Bartosz Nitka
[...] Uniques are non-deterministic [...]
Just out of curiosity: Why is this the case? Naively, I would assume that you can't say much about the value returned by getKey, but at least I would think that in repeated program runs, the same sequence of values would be produced. Well, unless somehow the values depend on pointer values, which will screw this up because of ASLR.

You’ve described the problem well. Indeed I think you should make a wiki page to articulate the problem, and point to this email (perhaps among others) for more detail. It’s so easy to lose track of email, and so helpful to have a wiki page that always reflects the latest understanding.
You say a little bit about the problem you are trying to solve, but not enough. (You can do this on the wiki page.) For example:
· You say you don’t care about binaries. (Good, but why?)
· Do you care about multi-threaded GHC? (I think no)
· Do you care about what happens if you recompile GHC, say with different optimisation settings? (I think no) That would affect order of evaluation, and hence the order of allocation of uniques.
· Do you care about recompiling the same source file with different environments; e.g. different compiler flags, changes in imported interface files. (I think no; these changes should require recompilation)
So can you characterise exactly what you DO care about? It might be something like “when using GHC as a library, with a single “session”, and I recompile an unchanged source file in an unchanged environment I want to get the same result”. But I think even that is wrong.
The reason I’m confused is when you say “What happens is that if you already have an interface file for a target you're trying to build the computation will proceed differently”. But what do you mean by “you already have an interface file”? In batch mode, we always load interface files; and with the same source and flags we almost certainly load them in the same order. So perhaps you mean in –make mode?
I’ll hypothesise that you mean
· In –make mode, with unchanged source I want to get the same output from compiling M.hs if M’s imported interface files have not changed.
But even then I’m confused. Under those circumstances, why are we recompiling at all?
Do you see what I mean about characterising the problem?
Depending on exactly what the problem is, one “solution” might be to not use –make mode.
But I think I’ve gone as far as I can without a clearer understanding of the problem. (I suggest responding by writing the wiki page, and sending a link.)
On the UniqFM question, in a finite map, the way in which keys are compared really, really should not matter. When you turn a UniqFM into a list you may need to canonicalise the list in some way But we shouldn’t mess up finite maps themselves in service of this goal. Better to focus on the canonicalization process; which as you point out may be hard or even impossible as things stand.
Many thanks
Simon
From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Bartosz Nitka
Sent: 14 September 2015 18:04
To: ghc-devs@haskell.org
Subject: Making compilation results deterministic (#4012)
Hello,
For the past couple of weeks I've been working on making compilation results deterministic.
What I'm focusing on right now is the interface file determinism, I don't care about binaries being deterministic.
I'd like to give a status update and ask for some advice, since I'm running into issues that I don't have a good way of solving.
The first question anyone might ask is how did nondeterminism creep into the compiler. If we're compiling with a single thread there's no reason for the computation to proceed in non deterministic way. I'm fairly certain that the issue originates from lazy loading of interface files. Relevant function: https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/typeche...https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fphabricator.haskell.org%2fdiffusion%2fGHC%2fbrowse%2fmaster%2fcompiler%2ftypecheck%2fTcRnMonad.hs%3b12098c2e70b2a432f4ed675ed72b53a396cb2842%241414-1421&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7ca4459b8986e1464920d808d2bd2690ca%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=RHe15k8A1wE9hP7yrgAXFWcunZkNIDrc%2fnlV7uazZBk%3d. What happens is that if you already have an interface file for a target you're trying to build the computation will proceed differently.
Why does lazy loading matter? As you load the interface file it needs to get type-checked and that means it needs to pull some Uniques from a global UniqSupply. It does that in different order resulting in different Unique assignment. As far as I can tell, lazy loading is required for performance, so I abandoned the idea of fixing it. I haven't looked at parallel compilation yet, but I'd expect it to result in different Unique assignment as well.
I believe up to this point we're ok. Uniques are non-deterministic, but it shouldn't be a big deal. Uniques should be opaque enough to not affect the order of computation, for example the order of binds considered. But they aren't.
Uniques are used in different ways throughout the compiler and they end up reordering things:
1) They have an `Ord` instance: https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/basicTy...https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fphabricator.haskell.org%2fdiffusion%2fGHC%2fbrowse%2fmaster%2fcompiler%2fbasicTypes%2fUnique.hs%3b12098c2e70b2a432f4ed675ed72b53a396cb2842%24190-195&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7ca4459b8986e1464920d808d2bd2690ca%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=b2GCvMA9dNAWLcGINcDM0pQ8PO7tlty98k%2fBcZcd%2bcQ%3d.
So far the places it impacts the most are places that use `stronglyConnCompFromEdgedVertices`, because Unique is used as a Node key and the result depends on the order of Nodes being considered. Some examples: https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/simplCo...https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fphabricator.haskell.org%2fdiffusion%2fGHC%2fbrowse%2fmaster%2fcompiler%2fsimplCore%2fOccurAnal.hs%3b12b0bb6f15caa5b4b01d0330a7a8d23e3c10842c%24183%2c646%2c681%2c846&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7ca4459b8986e1464920d808d2bd2690ca%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Db92ho6TpI%2fAj3UIdLbrAjRrVpL%2fK2QVzvClFlRxGhs%3d
https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/rename/...https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fphabricator.haskell.org%2fdiffusion%2fGHC%2fbrowse%2fmaster%2fcompiler%2frename%2fRnSource.hs%3b12b0bb6f15caa5b4b01d0330a7a8d23e3c10842c%241365&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7ca4459b8986e1464920d808d2bd2690ca%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=KL%2b8Q9ltuChQ8OIkXd1bCsp%2beDFfcFE9hL71GSWT7TE%3d (because Ord for Name uses Unique https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/basicTy...https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fphabricator.haskell.org%2fdiffusion%2fGHC%2fbrowse%2fmaster%2fcompiler%2fbasicTypes%2fName.hs%3b12b0bb6f15caa5b4b01d0330a7a8d23e3c10842c%24410-411&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7ca4459b8986e1464920d808d2bd2690ca%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=HTKDZejS%2bA%2fNAjc3jPYmuvJ8cNknHiyK4fLLZaZ1T1k%3d)
I've tried to see what removing it would entail and the changes would be far reaching: https://phabricator.haskell.org/P62https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fphabricator.haskell.org%2fP62&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7ca4459b8986e1464920d808d2bd2690ca%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=f2WW1C1rj9UGsoufDY2KYzBPM6%2frgHiqZDar%2bXg0p3Q%3d.
2) VarEnv, NameEnv are implemented in terms of UniqFM, which is just Data.IntMap with keys being the Unique integer values.
The way this bites us is that when UniqFM's get converted to a list they end up being sorted on Unique value. This problem is more widespread than the `stronglyConnCompFromEdgedVertices` issue, there's even a place where it's implicitly depended on: https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/nativeG...https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fphabricator.haskell.org%2fdiffusion%2fGHC%2fbrowse%2fmaster%2fcompiler%2fnativeGen%2fRegAlloc%2fLiveness.hs%3b12b0bb6f15caa5b4b01d0330a7a8d23e3c10842c%24837-842&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7ca4459b8986e1464920d808d2bd2690ca%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=UQ6Xel6diiS%2bTjx0OTUdzw9ar32Log%2bYlLAbzWiPYBg%3d.
I've tried to fix it by making `toList` return the elements in the order of insertion (https://phabricator.haskell.org/P63https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fphabricator.haskell.org%2fP63&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7ca4459b8986e1464920d808d2bd2690ca%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=XFUGV5jRPA8ft%2fNLYCi5zv4ynYIfcAzDfNFa8r753Zw%3d), but that turned out to have significant cost. My unscientific benchmark on aeson and text showed 10% compilation time increase. It's possible it can be done in less expensive way, I've tried a couple of approaches, but all of them resulted in 10% time increase. I've also considered to split UniqFM to two different types, one that keeps the ordering, and one that can't `toList`, but I suspect that the cut will not be clean, so I haven't tried that.
In some cases we got away with ordering things by OccName where needed: (https://phabricator.haskell.org/D1073https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fphabricator.haskell.org%2fD1073&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7ca4459b8986e1464920d808d2bd2690ca%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=g4LN6fOun9LW%2bFaQL6pZfGM6GH6rVJ3XZG9JlLgowrg%3d, https://phabricator.haskell.org/D1192https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fphabricator.haskell.org%2fD1192&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7ca4459b8986e1464920d808d2bd2690ca%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=nHSOQ07L1u4kM2cvATfosvJjBVw17dbaLcuil5V%2bFEU%3d), but OccName's don't have to be unique in every case and if we try to make them unique that would make them longer and probably result in even greater slowdown.
The instance I've recently looked at doesn't look like it can be solved by sorting by OccName.
The code that triggers the problem (simplified from haskell-src-exts):
data Decl l = Boring l
deriving (Eq)
data Binds l
= BDecls l [Decl l] -- ^ An ordinary binding group
| IPBinds l [IPBind l] -- ^ A binding group for implicit parameters
deriving (Eq)
data IPBind l = Boring2 l
deriving (Eq)
The end result is:
4449fe3f8368a2c47b2499a1fb033b6a
$fEqBinds_$c==$Binds :: Eq l => Binds l -> Binds l -> Bool
{- Arity: 1, HasNoCafRefs, Strictness:

On 16 Sep 2015, at 10:43, Simon Peyton Jones wrote:
I’ll hypothesise that you mean · In –make mode, with unchanged source I want to get the same output from compiling M.hs if M’s imported interface files have not changed. But even then I’m confused. Under those circumstances, why are we recompiling at all?
My understanding is that currently, if you build a Haskell project from clean sources with ghc --make, then wipe all the .o/.hi files, and rebuild again from clean, with all the same flags and environment, you are unlikely to end up with identical binaries for either the .o or .hi files. This lack of binary reproducibility is a performance problem within a larger build system (of which the Haskell components are only a part): if the larger build system sees that the Haskell .hi or .o has apparently changed (even though the sources+flags have not changed at all), then many other components that depend on them may be triggered for an unnecessary rebuild. Regards, Malcolm

| My understanding is that currently, if you build a Haskell project | from clean sources with ghc --make, then wipe all the .o/.hi files, | and rebuild again from clean, with all the same flags and environment, | you are unlikely to end up with identical binaries for either the .o | or .hi files Is that right Bartosz? If that's the goal, then can we please say that explicitly on the wiki page? Let's hypothesise that it it's the goal. Then I don't understand why, in a single-threaded GHC, you would get non-det results. Presumably not from lazy-interface-file-loading. After all the same things would be loaded in the same order, no? Before we can propose a solution we need to understand the goal; and having understood the goal we need to understand why the results are non-det right now. Otherwise we risk fixing the wrong problem. Anyway, one step at a time :-) Goal first; then a concrete example that demonstrates the problem; then a hypothesis of what is going wrong... Simon

On 16/09/2015 12:18, Simon Peyton Jones wrote:
| My understanding is that currently, if you build a Haskell project | from clean sources with ghc --make, then wipe all the .o/.hi files, | and rebuild again from clean, with all the same flags and environment, | you are unlikely to end up with identical binaries for either the .o | or .hi files
Is that right Bartosz? If that's the goal, then can we please say that explicitly on the wiki page?
Let's hypothesise that it it's the goal. Then I don't understand why, in a single-threaded GHC, you would get non-det results. Presumably not from lazy-interface-file-loading. After all the same things would be loaded in the same order, no?
Bartosz is going to write a wiki page and answer your earlier questions, but I'll try to go into a bit more detail about this point. We don't fully understand why we get non-deterministic results from a single-threaded GHC with a clean build, however there are things that will change from run to run that might influence compilation, e.g. the contents of directories on disk can change and the names of temporary files will change. We know for sure that having an old .hi file from a previous compilation causes uniques to change, because GHC reads the .hi file and assigns some uniques (yet this should clearly not affect the compilation results). Perhaps we could fix these things, but it's fragile, and furthermore we want to handle multithreaded compilation and --make, both of which make it much harder. So we concluded that it was probably futile to aim for fully-deterministic compilation by making the uniques the same every time, and instead we should try to make compilation deterministic in the face of non-deterministic uniques. This also turns out to be really hard, hence Bartosz' long email about the problems and the things he tried. We don't currently have a good way to reproduce the problem from a completely clean build, however it's easy to reproduce by doing two builds and leaving the .hi files from the first build in place while removing the .o files. You could also reproduce it easily by randomizing the order that uniques are generated in some way. Cheers Simon

| We don't currently have a good way to reproduce the problem from a | completely clean build, however it's easy to reproduce by doing two | builds and leaving the .hi files from the first build in place while | removing the .o files Indeed. But is determinism in the face of .hi file changes Part of the Goal? | So we concluded that it was probably futile to aim for | fully-deterministic compilation by making the uniques the same every | time, and instead we should try to make compilation deterministic in the | face of non-deterministic uniques. Well that is bound to be fragile too. IF the uniques could be made deterministic, for the use-cases that are part of The Goal, then it would be a simple solution. Determinism in the face of non-det uniques. I suggest NOT monkeying with UniqFM or Data.Map. Rather, I think you should focuse on every use of ufmToList. There are a lot of these, and many of them will be totally fine to be non-deterministic. On a case-by-case basis (fragile) I guess you can find some canonical ordering. That might be hard in the case, say, of identical OccNames with different uniques (e.g. ds_343, ds_324). I suppose that in every case we'd have to look for some other way to canonicalise. Ugh. Still, one step at a time. Wiki page would be good. Simon

On 17/09/2015 17:30, Simon Peyton Jones wrote:
| We don't currently have a good way to reproduce the problem from a | completely clean build, however it's easy to reproduce by doing two | builds and leaving the .hi files from the first build in place while | removing the .o files
Indeed. But is determinism in the face of .hi file changes Part of the Goal?
Hopefully the wiki should clarify this. I've just edited it a bit to expand on the motivation and a few other details. https://ghc.haskell.org/trac/ghc/wiki/DeterministicBuilds Cheers Simon

2015-09-16 12:18 GMT+01:00 Simon Peyton Jones
| My understanding is that currently, if you build a Haskell project | from clean sources with ghc --make, then wipe all the .o/.hi files, | and rebuild again from clean, with all the same flags and environment, | you are unlikely to end up with identical binaries for either the .o | or .hi files
Is that right Bartosz? If that's the goal, then can we please say that explicitly on the wiki page?
Let's hypothesise that it it's the goal. Then I don't understand why, in a single-threaded GHC, you would get non-det results. Presumably not from lazy-interface-file-loading. After all the same things would be loaded in the same order, no?
Before we can propose a solution we need to understand the goal; and having understood the goal we need to understand why the results are non-det right now. Otherwise we risk fixing the wrong problem.
Anyway, one step at a time :-) Goal first; then a concrete example that demonstrates the problem; then a hypothesis of what is going wrong...
There are two related goals here that I'm trying to tackle at the same time: build speed and binary file compatibility. I've created a wiki and added a concrete example of how the same environment leads to different interface files: https://ghc.haskell.org/trac/ghc/wiki/DeterministicBuilds#Aconcreteexample. Simon Marlow has already explained what goes wrong there and the difference is that the first build doesn't have any old interface files that the second build from the same sources does. This happens often in practice in production environments when people switch branches. Another way to trigger it is to remove all the .o files and leave .hi files before rebuilding. | Depending on exactly what the problem is, one “solution” might be to not use –make mode. This is not specific to the --make mode. I've answered other questions on the wiki. | Anyway, one step at a time :-) Goal first; then a concrete example that demonstrates the problem; then a hypothesis of what is going wrong... I'm sorry, I unloaded everything I've learned about this in one post without a clear narrative. Cheers
participants (5)
-
Bartosz Nitka
-
Malcolm Wallace
-
Simon Marlow
-
Simon Peyton Jones
-
Sven Panne