Re: [GHC] #12485: -package-db flags now need to be sorted by dependency order

30 Aug 2016

      ...
The way the packages are built is with Setup.hs and a separate
#12485: -package-db flags now need to be sorted by dependency order
-------------------------------------+-------------------------------------
        Reporter:  niteria           |                Owner:
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.0.2
       Component:  Package system    |              Version:  8.0.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):  phab:D2450
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by ezyang):

 OK. So the way I'll structure this is first describe some workarounds to
 work with the current behavior, and then assuming those workarounds don't
 work / are undesirable I'll try to comment on how we can make this work.

 package.db for each package.

 One thing you can do in this situation is use ghc-pkg recache to create a
 merged package database and pass that off to GHC. So you'd swizzle all the
 text files into a directory, make the db, and you'd be off to the races.
...
This is an undocumented change in behaviour in the very least. The
 manual didn't state that they can be in any order, but also didn't put any
 constraints on order.
To be fair, the manual does state that package databases are ordered, and
 that packages closer to the top will shadow those below them
 (https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/packages.htm...
 #package-databases). It doesn't explicitly state that every substack
 should be well-formed, but this is a constraint that `ghc-pkg` checks (if
 you're not forcing it to register). (FWIW, Harendrar posted a Diff
 https://phabricator.haskell.org/D2464 which should improve the docs here
 further.)
...
This sounds like an implementation detail informing the specification.
 Is there a fundamental reason why the flags have to be ordered?
OK, let me explain the shadowing situation in more detail, and also how
 the package database handling has changed in the recent few releases.

 There is a very important correctness constraint GHC enforces on the
 package databases that it reads in, which is there should not be two
 distinct packages with the same "key" (what constitutes a key has changed
 over time). This is pretty important because if two unit ids are equal,
 GHC assumes they are type equal: if there are two distinct packages (which
 could define totally types and functions) with the same key, GHC will mix
 them up and generate code that almost certainly will segfault.

 In GHC 7.8 and earlier, the key was just the package name plus version. So
 it was not that uncommon to have two package databases which defined the
 same package name and version. To keep things safe, GHC shadowed packages,
 throwing out packages with any conflicting source package IDs. Every
 package was also associated with an installed package id, which was
 derived directly off of the ABI of a package. When two installed package
 ids coincided, it was always safe to pick one (the later one) because the
 coinciding installed package id meant that the ABIs matched, so there'd be
 no problem reusing it.

 OK, so this business where you can't have two packages with the same
 source package ID was the source of Cabal hell (cabal-install was not
 clever enough to put every package in a separate db) so in 7.10 we
 introduced "package keys", which were a bit more fine-grained than source
 package ids and what we used for type equality, linker symbols, etc. IPIDs
 continued to be derived off ABI hashes. Package keys didn't really
 necessitate any changes in how shadowing worked, since there still was a
 separate notion for IPIDs.

 At some point in the GHC 8.0 release cycle, SPJ was wondering why we need
 package keys and IPIDs. At the same time, work on cabal new-build was
 afoot, which eschewed the use of ABI hashes for IPIDs (since they couldn't
 be computed before we actually built the package; new-build needs to
 compute the IDs ahead of time so that it can determine if the particular
 build it needs is already built.) So in GHC 8.0 we unified IPIDs and
 package keys.

 OK, and now we get to the set of commits which broke database for you. So,
 when IPIDs don't track ABI hashes anymore, it's a bit more difficult to
 say what ABI a package depends on: after all, we record dependencies as
 IPIDs, not ABIs (maybe we should have recorded ABIs of the deps!) So, I
 needed to find a new algorithm which:

 1. Maintained the safety invariant, that we never tried to load two
 distinct packages with the same IPID (previously package key, previously
 source package id), and never used the wrong copy of the package with a
 package that was compiled with a different package

 2. Preserved the old shadowing behavior when the IPID conflicted when two
 package databases merged together--we need to prefer the latest one (I
 didn't want to implement this but bootstrapping stopped working without
 this. I believe the issue is that the distributed boot libraries with GHC
 don't come with hashes, so when we rebuild those libraries to boot, we end
 up picking the same name. Better use the new one!)

 3. Preserved the old behavior where if you were ABI-compatible, you could
 override a package from the earlier database as long as the ABIs matched,
 without breaking a pile of packages.

 So... I sinned, and assumed that as we added databases to our stack, the
 database would continue to be well-formed. Which has ruined your day!

 Having written this, I don't think my suggested fix will keep GHC
 bootstrapping as it is today. It's a bit unavoidable: if the build system
 picks a deterministic id for a boot library, if you then immediately
 bootstrap with that GHC, it will pick the *same* id. So you *need* some
 form of shadowing.

 Maybe what you just want is another mode for GHC to make it process
 package databases differently? (This is why #12518 seems relevant.)

--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12485#comment:9
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler