
Hello,
For the past couple of weeks I've been working on making compilation
results deterministic.
What I'm focusing on right now is the interface file determinism, I don't
care about binaries being deterministic.
I'd like to give a status update and ask for some advice, since I'm running
into issues that I don't have a good way of solving.
The first question anyone might ask is how did nondeterminism creep into
the compiler. If we're compiling with a single thread there's no reason for
the computation to proceed in non deterministic way. I'm fairly certain
that the issue originates from lazy loading of interface files. Relevant
function:
https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/typeche....
What happens is that if you already have an interface file for a target
you're trying to build the computation will proceed differently.
Why does lazy loading matter? As you load the interface file it needs to
get type-checked and that means it needs to pull some Uniques from a global
UniqSupply. It does that in different order resulting in different Unique
assignment. As far as I can tell, lazy loading is required for performance,
so I abandoned the idea of fixing it. I haven't looked at parallel
compilation yet, but I'd expect it to result in different Unique assignment
as well.
I believe up to this point we're ok. Uniques are non-deterministic, but it
shouldn't be a big deal. Uniques should be opaque enough to not affect the
order of computation, for example the order of binds considered. But they
aren't.
Uniques are used in different ways throughout the compiler and they end up
reordering things:
1) They have an `Ord` instance:
https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/basicTy...
.
So far the places it impacts the most are places that use
`stronglyConnCompFromEdgedVertices`, because Unique is used as a Node key
and the result depends on the order of Nodes being considered. Some
examples:
https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/simplCo...
https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/rename/...
(because Ord for Name uses Unique
https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/basicTy...
)
I've tried to see what removing it would entail and the changes would be
far reaching: https://phabricator.haskell.org/P62.
2) VarEnv, NameEnv are implemented in terms of UniqFM, which is just
Data.IntMap with keys being the Unique integer values.
The way this bites us is that when UniqFM's get converted to a list they
end up being sorted on Unique value. This problem is more widespread than
the `stronglyConnCompFromEdgedVertices` issue, there's even a place where
it's implicitly depended on:
https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/nativeG...
.
I've tried to fix it by making `toList` return the elements in the order of
insertion (https://phabricator.haskell.org/P63), but that turned out to
have significant cost. My unscientific benchmark on aeson and text showed
10% compilation time increase. It's possible it can be done in less
expensive way, I've tried a couple of approaches, but all of them resulted
in 10% time increase. I've also considered to split UniqFM to two different
types, one that keeps the ordering, and one that can't `toList`, but I
suspect that the cut will not be clean, so I haven't tried that.
In some cases we got away with ordering things by OccName where needed: (
https://phabricator.haskell.org/D1073, https://phabricator.haskell.org/D1192),
but OccName's don't have to be unique in every case and if we try to make
them unique that would make them longer and probably result in even greater
slowdown.
The instance I've recently looked at doesn't look like it can be solved by
sorting by OccName.
The code that triggers the problem (simplified from haskell-src-exts):
data Decl l = Boring l
deriving (Eq)
data Binds l
= BDecls l [Decl l] -- ^ An ordinary binding group
| IPBinds l [IPBind l] -- ^ A binding group for implicit parameters
deriving (Eq)
data IPBind l = Boring2 l
deriving (Eq)
The end result is:
4449fe3f8368a2c47b2499a1fb033b6a
$fEqBinds_$c==$Binds :: Eq l => Binds l -> Binds l -> Bool
{- Arity: 1, HasNoCafRefs, Strictness: