
The way the packages are built is with Setup.hs and a separate
#12485: -package-db flags now need to be sorted by dependency order -------------------------------------+------------------------------------- Reporter: niteria | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.2 Component: Package system | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): phab:D2450 Wiki Page: | -------------------------------------+------------------------------------- Comment (by ezyang): OK. So the way I'll structure this is first describe some workarounds to work with the current behavior, and then assuming those workarounds don't work / are undesirable I'll try to comment on how we can make this work. package.db for each package. One thing you can do in this situation is use ghc-pkg recache to create a merged package database and pass that off to GHC. So you'd swizzle all the text files into a directory, make the db, and you'd be off to the races.
This is an undocumented change in behaviour in the very least. The manual didn't state that they can be in any order, but also didn't put any constraints on order.
To be fair, the manual does state that package databases are ordered, and that packages closer to the top will shadow those below them (https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/packages.htm... #package-databases). It doesn't explicitly state that every substack should be well-formed, but this is a constraint that `ghc-pkg` checks (if you're not forcing it to register). (FWIW, Harendrar posted a Diff https://phabricator.haskell.org/D2464 which should improve the docs here further.)
This sounds like an implementation detail informing the specification. Is there a fundamental reason why the flags have to be ordered?
OK, let me explain the shadowing situation in more detail, and also how the package database handling has changed in the recent few releases. There is a very important correctness constraint GHC enforces on the package databases that it reads in, which is there should not be two distinct packages with the same "key" (what constitutes a key has changed over time). This is pretty important because if two unit ids are equal, GHC assumes they are type equal: if there are two distinct packages (which could define totally types and functions) with the same key, GHC will mix them up and generate code that almost certainly will segfault. In GHC 7.8 and earlier, the key was just the package name plus version. So it was not that uncommon to have two package databases which defined the same package name and version. To keep things safe, GHC shadowed packages, throwing out packages with any conflicting source package IDs. Every package was also associated with an installed package id, which was derived directly off of the ABI of a package. When two installed package ids coincided, it was always safe to pick one (the later one) because the coinciding installed package id meant that the ABIs matched, so there'd be no problem reusing it. OK, so this business where you can't have two packages with the same source package ID was the source of Cabal hell (cabal-install was not clever enough to put every package in a separate db) so in 7.10 we introduced "package keys", which were a bit more fine-grained than source package ids and what we used for type equality, linker symbols, etc. IPIDs continued to be derived off ABI hashes. Package keys didn't really necessitate any changes in how shadowing worked, since there still was a separate notion for IPIDs. At some point in the GHC 8.0 release cycle, SPJ was wondering why we need package keys and IPIDs. At the same time, work on cabal new-build was afoot, which eschewed the use of ABI hashes for IPIDs (since they couldn't be computed before we actually built the package; new-build needs to compute the IDs ahead of time so that it can determine if the particular build it needs is already built.) So in GHC 8.0 we unified IPIDs and package keys. OK, and now we get to the set of commits which broke database for you. So, when IPIDs don't track ABI hashes anymore, it's a bit more difficult to say what ABI a package depends on: after all, we record dependencies as IPIDs, not ABIs (maybe we should have recorded ABIs of the deps!) So, I needed to find a new algorithm which: 1. Maintained the safety invariant, that we never tried to load two distinct packages with the same IPID (previously package key, previously source package id), and never used the wrong copy of the package with a package that was compiled with a different package 2. Preserved the old shadowing behavior when the IPID conflicted when two package databases merged together--we need to prefer the latest one (I didn't want to implement this but bootstrapping stopped working without this. I believe the issue is that the distributed boot libraries with GHC don't come with hashes, so when we rebuild those libraries to boot, we end up picking the same name. Better use the new one!) 3. Preserved the old behavior where if you were ABI-compatible, you could override a package from the earlier database as long as the ABIs matched, without breaking a pile of packages. So... I sinned, and assumed that as we added databases to our stack, the database would continue to be well-formed. Which has ruined your day! Having written this, I don't think my suggested fix will keep GHC bootstrapping as it is today. It's a bit unavoidable: if the build system picks a deterministic id for a boot library, if you then immediately bootstrap with that GHC, it will pick the *same* id. So you *need* some form of shadowing. Maybe what you just want is another mode for GHC to make it process package databases differently? (This is why #12518 seems relevant.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12485#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler