Re: haddock-2.3.0 literate comments discarded from .lhs input

On Wed, 2009-05-27 at 15:10 +0100, Alistair Bayley wrote:
Andrea,
2009/3/19 Andrea Vezzosi
: It turns out that those variables are there to allow relocation, in fact $topdir is expanded by Distribution.Simple.GHC.getInstalledPackages, it seems that $httptopdir has been overlooked. I'd be tempted to say that it's ghc-pkg dump/describe responsibility to expand those vars instead, like it does for ghc-pkg field.
Do you (or anyone else) intend to work on this? If not, I'd like to fix it, but I'll need some guidance. Like, is Distribution.Simple.GHC.getInstalledPackages where the variable expansion code should go, or should it be somewhere else?
I don't think we should be hacking around this in Cabal without any discussion with the ghc folks on what is supposed to be there, what variables are allowed. We need a clear spec on what variables tools are expected to handle and how they are to be interpreted. The output of ghc-pkg describe/dump is not just for ghc to define and play around with. It's supposed to be defined by the Cabal spec. Supporting relocatable sets of packages is a good idea. We should aim to have something that is usable by each compiler, not just ghc, so interpreting paths relative to ghc's libdir doesn't seem ideal. How about this: a way to specify paths in the package registration info that are relative to the location of the package db they are in. That makes sense beyond just ghc and even with would allow other sets of relocatable packages, not just those installed with ghc. Then perhaps as a compat hack we should get Cabal to handle older ghc versions that do use these funny vars. Duncan

It turns out that those variables are there to allow relocation, in fact $topdir is expanded by Distribution.Simple.GHC.getInstalledPackages, it seems that $httptopdir has been overlooked. I'd be tempted to say that it's ghc-pkg dump/describe responsibility to expand those vars instead, like it does for ghc-pkg field.
Agreed on ghc-pkg doing the translation. Via commandline options, or via environment vars (one might be tempted to manage the bindings in ghc-pkg's database itself, even). The lack of support for this hampers the useability of ghc-pkg and the database it is responsible for.
We need a clear spec on what variables tools are expected to handle and how they are to be interpreted.
Currently, there seem to be $topdir and $httptopdir. Given the split between GHC and HP, it might be useful to have an additional $hptopdir, or just a general mechanism for variables in ghc-pkg's database (I recall being disappointed when what looked like environment variables were unaffected by environment settings..). The info is somewhat distributed: http://darcs.haskell.org/ghc/utils/ghc-pkg/Main.hs http://darcs.haskell.org/ghc/compiler/main/Packages.lhs http://darcs.haskell.org/ghc/compiler/main/SysTools.lhs [Note topdir]
Supporting relocatable sets of packages is a good idea. We should aim to have something that is usable by each compiler, not just ghc, so interpreting paths relative to ghc's libdir doesn't seem ideal.
GHC makes no reference to libdir, it simply talks about a $topdir (where it would like to store things it needs) and $httptopdir (where haddocks might be found).
How about this: a way to specify paths in the package registration info that are relative to the location of the package db they are in.
ahem. That sounds like a backwards step, being dependent on two locations instead of one. Before the HP, windows GHCs could be relocated without needing to update the ghc-pkg database, even if some packages were installed outside GHCs $topdir. With your variant, just about any change would need updating. Assuming that the parts are independently located by whatever the OS packaging conventions say, and can be independently relocated otherwise, it seems simpler to continue with the variable scheme, but with improved support and documentation for it. Claus

On Wed, 2009-05-27 at 19:47 +0100, Claus Reinke wrote:
We need a clear spec on what variables tools are expected to handle and how they are to be interpreted.
Currently, there seem to be $topdir and $httptopdir.
And I can't see a justification for there being two.
Given the split between GHC and HP, it might be useful to have an additional $hptopdir, or just a general mechanism for variables in ghc-pkg's database (I recall being disappointed when what looked like environment variables were unaffected by environment settings..).
I'd rather not create ad-hoc vars which everyone needs to know about for things like the platform.
Supporting relocatable sets of packages is a good idea. We should aim to have something that is usable by each compiler, not just ghc, so interpreting paths relative to ghc's libdir doesn't seem ideal.
GHC makes no reference to libdir, it simply talks about a $topdir (where it would like to store things it needs) and $httptopdir (where haddocks might be found).
Yes ok, on windows the topdir is the parent of dir containing ghc.exe.
How about this: a way to specify paths in the package registration info that are relative to the location of the package db they are in.
ahem. That sounds like a backwards step, being dependent on two locations instead of one.
I don't follow this. Which two?
Before the HP, windows GHCs could be relocated without needing to update the ghc-pkg database, even if some packages were installed outside GHCs $topdir.
I don't see how this is related to what the Windows installer for the HP is doing. Sure, since it's installing packages relative to ghc and we'd like the whole thing to be relocatable then it should use relative paths. I don't think anyone disputes that, the question is how to implement relative paths.
With your variant, just about any change would need updating.
I must be missing something. If you move package.conf and the packages in one go, then nothing needs changing as far as I can see.
Assuming that the parts are independently located by whatever the OS packaging conventions say, and can be independently relocated otherwise, it seems simpler to continue with the variable scheme, but with improved support and documentation for it.
My suggestion seems very simple! I'm clearly missing some problem which you can see. To be clear, here's what I'm imagining: blah/package.conf blah/lib/foo-1.0/libfoo-1.0.a and package.conf would contain foo-1.0 with paths looking like "$dbdir/lib/foo-1.0". That is, we interpret $dbdir (or whatever var name we agree on) as being "blah/" because that's the dir containing the db. So crucially, it doesn't really matter where ghc.exe is. Assuming ghc can find the package conf then it can find all the files. So it'd let you create multiple relocatable package collections. If the primary package db is kept relative to ghc (eg in ghc's topdir) then the whole ghc installation including libs is relocatable Duncan

Currently, there seem to be $topdir and $httptopdir. And I can't see a justification for there being two.
Each variable provides an indirection that decouples the installation from one source of _independent_ relocations (btw, I've always imagined that it is called 'http' instead of 'html' to allow for references to haskell.org when no local docs are installed, but it doesn't seem to work that way).
How about this: a way to specify paths in the package registration info that are relative to the location of the package db they are in. ahem. That sounds like a backwards step, being dependent on two locations instead of one. I don't follow this. Which two?
package db + package path: in the current system, you only have to update the package db if you move a package that isn't installed under the GHC tree; in your suggestion, you also have to update it if you move the package db/GHC itself while having non-core packages installed outside the GHC tree.
Before the HP, windows GHCs could be relocated without needing to update the ghc-pkg database, even if some packages were installed outside GHCs $topdir.
I don't see how this is related to what the Windows installer for the HP is doing. Sure, since it's installing packages relative to ghc and we'd like the whole thing to be relocatable then it should use relative paths. I don't think anyone disputes that, the question is how to implement relative paths.
I was just disambiguating which GHC installers I was referring to, since there are now two possibilities, with different properties.
With your variant, just about any change would need updating. I must be missing something. If you move package.conf and the packages in one go, then nothing needs changing as far as I can see.
You seem to be assuming that everything is under a common root? That isn't the case for most unixes (different locations for bin/ doc/ lib/ .., docs installed or not), and even on windows, it stopped being the case with cabal insisting on 'Program Files/Haskell/...' as the default install. Since ghc traditionally installs into 'c:/ghc/ghc-<version>' (on my system, at least, but I think that no-spaces-location was suggested by one of the GHC installers originally, and spaces in tool paths still confuse the GHC build system), I have two locations. If I move GHC, nothing needs changing. If I move packages that didn't come with GHC, package.db needs updating. If the packages had been registered wrt to a $cabaltopdir, no changes would be needed in either case. In your suggestion, if I move GHC but not the packages, package.db needs updating, if I move the packages but not GHC, package.dg needs updating, only if I move both, and by the same relative path, no update is needed.
Assuming that the parts are independently located by whatever the OS packaging conventions say, and can be independently relocated otherwise, it seems simpler to continue with the variable scheme, but with improved support and documentation for it.
My suggestion seems very simple! I'm clearly missing some problem which you can see.
To be clear, here's what I'm imagining:
blah/package.conf blah/lib/foo-1.0/libfoo-1.0.a
That is everything under one tree, right? And since package.conf is GHC's register, GHC would have to be in that tree as well.
and package.conf would contain foo-1.0 with paths looking like "$dbdir/lib/foo-1.0". That is, we interpret $dbdir (or whatever var name we agree on) as being "blah/" because that's the dir containing the db.
So crucially, it doesn't really matter where ghc.exe is. Assuming ghc can find the package conf then it can find all the files. So it'd let you create multiple relocatable package collections. If the primary package db is kept relative to ghc (eg in ghc's topdir) then the whole ghc installation including libs is relocatable
That is what GHC did on windows before cabal changed the package locations away to a path that neither GHC nor its build tools can use. Is that even possible on unix systems, with their various packaging and location traditions? And if Simon ever makes that breakthrough of binary compatibility at least between minor GHC versions, we can't have the libraries in the GHC directories, as they'd be shared between several GHCs. Claus

On Thu, 2009-05-28 at 11:16 +0100, Claus Reinke wrote:
How about this: a way to specify paths in the package registration info that are relative to the location of the package db they are in. ahem. That sounds like a backwards step, being dependent on two locations instead of one. I don't follow this. Which two?
package db + package path: in the current system, you only have to update the package db if you move a package that isn't installed under the GHC tree; in your suggestion, you also have to update it if you move the package db/GHC itself while having non-core packages installed outside the GHC tree.
But if you're registering global packages that are installed outside of the GHC tree then you wouldn't register them using relative paths. I'm not saying everything must use relative paths.
With your variant, just about any change would need updating. I must be missing something. If you move package.conf and the packages in one go, then nothing needs changing as far as I can see.
You seem to be assuming that everything is under a common root?
Well it is on Windows which is the main case where people want relocatable installations. If we wanted relocatable installations on Unix then it'd all have to be under one root too, eg /opt/whatever.
That isn't the case for most unixes (different locations for bin/ doc/ lib/ .., docs installed or not), and even on windows, it stopped being the case with cabal insisting on 'Program Files/Haskell/...' as the default install.
Sure, extra packages should not be installed in the ghc tree and so those should not use paths relative to the ghc location.
Since ghc traditionally installs into 'c:/ghc/ghc-<version>' (on my system, at least, but I think that no-spaces-location was suggested by one of the GHC installers originally, and spaces in tool paths still confuse the GHC build system), I have two locations.
If I move GHC, nothing needs changing. If I move packages that didn't come with GHC, package.db needs updating. If the packages had been registered wrt to a $cabaltopdir, no changes would be needed in either case.
For some reason I really dislike the idea that we make up specific vars like $cabaltopdir for specific purposes. Perhaps that's just me. I want a general solution, not something that forces everyone to adopt conventions like installing everything in ~/.cabal/. That's just a sensible default, but the user rightly has full control over --prefix, --libdir etc etc.
In your suggestion, if I move GHC but not the packages, package.db needs updating,
No it does not. That would only be the case if you always registered things relative to ghc, but that'd be silly for things not actually installed in the ghc install tree.
if I move the packages but not GHC, package.dg needs updating, only if I move both, and by the same relative path, no update is needed.
Are you suggesting that we need to be able to move core libs that are distributed with ghc, independently of where the ghc binary is?
Assuming that the parts are independently located by whatever the OS packaging conventions say, and can be independently relocated otherwise, it seems simpler to continue with the variable scheme, but with improved support and documentation for it.
My suggestion seems very simple! I'm clearly missing some problem which you can see.
To be clear, here's what I'm imagining:
blah/package.conf blah/lib/foo-1.0/libfoo-1.0.a
That is everything under one tree, right?
Not necessarily. For the things in the same tree it'd be sensible to use relative paths. For things not in the same tree it'd be sensible to use absolute paths. This scheme also allows other sets of relocatable packages, so long as ghc gets told where to find the package.conf.
And since package.conf is GHC's register, GHC would have to be in that tree as well.
For core packages shipped with ghc/hp, yes.
and package.conf would contain foo-1.0 with paths looking like "$dbdir/lib/foo-1.0". That is, we interpret $dbdir (or whatever var name we agree on) as being "blah/" because that's the dir containing the db.
So crucially, it doesn't really matter where ghc.exe is. Assuming ghc can find the package conf then it can find all the files. So it'd let you create multiple relocatable package collections. If the primary package db is kept relative to ghc (eg in ghc's topdir) then the whole ghc installation including libs is relocatable
That is what GHC did on windows before cabal changed the package locations away to a path that neither GHC nor its build tools can use.
Do you mean installing binaries in C:\Program Files\Haskell\bin by default? That decision was made by the Windows users. It's true that the GHC build system cannot work in a directory containing spaces, and that's probably too hard to fix. However using tools (eg happy, alex) that are in a dir containing spaces should not be nearly so hard to fix.
Is that even possible on unix systems, with their various packaging and location traditions?
I'm not sure what you're referring to.
And if Simon ever makes that breakthrough of binary compatibility at least between minor GHC versions, we can't have the libraries in the GHC directories, as they'd be shared between several GHCs.
I've never suggested they should be. Only things distributed with ghc/hp should be together in one relocatable tree. Everything else the user installs should go in the appropriate locations. Duncan

But if you're registering global packages that are installed outside of the GHC tree then you wouldn't register them using relative paths. I'm not saying everything must use relative paths.
Please don't move your windmills while I'm fighting them!-) If you don't want to move from absolute paths for non-core packages, the current system should just work, right? I thought we were talking about (a) making ghc-pkg (optionally) instantiate any variables in its database in (all of) its command-line output and (b) allowing non-core packages to be relocated without having to update ghc-pkg's database.
For some reason I really dislike the idea that we make up specific vars like $cabaltopdir for specific purposes. Perhaps that's just me. I want a general solution, not something that forces everyone to adopt conventions like installing everything in ~/.cabal/. That's just a sensible default, but the user rightly has full control over --prefix, --libdir etc etc.
Personally, I only dislike the idea of hardcoding specific variable names in ghc-pkg, which is why I suggested a name-independent approach (I also dislike the current duplication of code in ghc-pkg/ghc api/..). $cabaltopdir would just improve the handling of the default cabal install locations, without dictating where users say those default locations should be - and if users move specific packages/package parts to different absolute locations, those absolute locations would still have to appear in the package database, but I'd expect that to be an exception. If common prefixes are abstracted out via variables, it would simply be easier to see that the majority of package parts are not randomly distributed over the available file systems, but related to the chosen default settings of the tool that installed them (that might involve communication between GHC and Cabal: GHC knows about its own dir, but would have to ask Cabal about its locations - or, better, Cabal could tell GHC about its locations once, when the user changes them). I'm mostly seeing the windows perspective at the moment, btw, but even on unix, one might want to abstract out common prefixes, in case one decides to move packages from $HOME/ to system-wide prefixes, or from one system-wide prefix to another. Perhaps the difference doesn't matter much, apart from readability: Let's say I wanted to move a GHC/Cabal/HP installation to a USB drive: moving GHC/corelibs is straightforward (it doesn't care under what drive name the USB drive gets mounted on the lecture theatre computer), but how would I move Cabal-installed non-core packages (not to mention Cabal itself?)? Is that use case documented in some faq? If the extra package paths are absolute, it would involve something like search&replace on the concrete representation of the supposedly abstract package database, but as long as that representation is a simple text file, that might not be too troublesome; if the extra package paths are relative to a $cabaltopdir, it would involve telling GHC about the new location prefix whenever calling it directly (or telling Cabal about its new location, and Cabal passing that on when calling GHC).
That is what GHC did on windows before cabal changed the package locations away to a path that neither GHC nor its build tools can use. Do you mean installing binaries in C:\Program Files\Haskell\bin by default? That decision was made by the Windows users.
s/the/some/ ;-) It is a reasonable default to expect, but if Cabal had ever asked me before starting to install things there, I'd have changed that default immediately. I was thinking more about things that would appear in package.conf: C:\Program Files\Haskell\<package>\<ghc-version> C:\Program Files\Haskell\doc\<package> but it is the same difference: there are now two locations to consider even on windows (GHC/corelibs + Cabal/other packages), and that is probably how it should be.
It's true that the GHC build system cannot work in a directory containing spaces, and that's probably too hard to fix. However using tools (eg happy, alex) that are in a dir containing spaces should not be nearly so hard to fix.
Maybe so, but last time (end of January) I asked about the GHC build (in a space-free path) using tools where cabal installs them by default (with spaces in path), Simon M answered: "It's not practical in general to cope with spaces in paths in the build system. IIRC we tried to get this right once and gave up.". So if there is a tool path specific subset of the problem that could be solved more easily, it doesn't seem to help.
Is that even possible on unix systems, with their various packaging and location traditions? I'm not sure what you're referring to.
Some unix branches seem to distinguish themselves merely by different package management/location. But apart from Mac frameworks, I'm not aware of any unix that would not expect libraries/binaries/docs to be installed in different locations (instead of common, language-specific roots, the common roots are purpose-specific). Relocation might not be as typical there, though (don't know how mounting external drives affects paths there, or whether users might want to relocate packages between different prefixes like /opt, /etc, /usr, .., or just from user-local to system-wide locations).
And if Simon ever makes that breakthrough of binary compatibility at least between minor GHC versions, we can't have the libraries in the GHC directories, as they'd be shared between several GHCs.
I've never suggested they should be. Only things distributed with ghc/hp should be together in one relocatable tree. Everything else the user installs should go in the appropriate locations.
Ok, then the question is just how to communicate those locations to GHC: monolithically, as absolute paths, or separated into common prefixes and relative paths (usually, only the prefixes would change on relocations). My preference would be separated, with ghc-pkg filling in the prefix variables' current values when asked to do so. Claus

On Thu, 2009-05-28 at 14:12 +0100, Claus Reinke wrote:
But if you're registering global packages that are installed outside of the GHC tree then you wouldn't register them using relative paths. I'm not saying everything must use relative paths.
Please don't move your windmills while I'm fighting them!-)
If you don't want to move from absolute paths for non-core packages, the current system should just work, right?
Yes. Though it also allows for the possibility of relocatable sets of packages that are not installed relative to the compiler. But more importantly it's more general and simpler than the current '$topdir' that ghc uses.
I thought we were talking about
(a) making ghc-pkg (optionally) instantiate any variables in its database in (all of) its command-line output and
Yes, though I'm only asking for two vars (previously one), not an ad-hoc set of vars.
(b) allowing non-core packages to be relocated without having to update ghc-pkg's database.
In my suggested system this is possible if that set of packages use their own package db (containing relative paths). In your system it's possible by updating some var in a central registry and having that set of packages use paths relative to that var.
For some reason I really dislike the idea that we make up specific vars like $cabaltopdir for specific purposes. Perhaps that's just me. I want a general solution, not something that forces everyone to adopt conventions like installing everything in ~/.cabal/. That's just a sensible default, but the user rightly has full control over --prefix, --libdir etc etc.
Personally, I only dislike the idea of hardcoding specific variable names in ghc-pkg, which is why I suggested a name-independent approach (I also dislike the current duplication of code in ghc-pkg/ghc api/..).
$cabaltopdir would just improve the handling of the default cabal install locations, without dictating where users say those default locations should be - and if users move specific packages/package parts to different absolute locations, those absolute locations would still have to appear in the package database, but I'd expect that to be an exception.
So ghc's current system uses two vars, $topdir and $httptopdir. I'm proposing to replace those with a standardised ${pkgroot} and ${pkgrooturl} vars which are usable by all compilers and in more situations. You're proposing a central registry of vars and to have ghc-pkg (optionally) expand these vars which could be used anywhere in the installed package descriptions. Presumably you're also suggesting some mechanism to query and update this registry of variables. Is that a fair summary?
Let's say I wanted to move a GHC/Cabal/HP installation to a USB drive: moving GHC/corelibs is straightforward (it doesn't care under what drive name the USB drive gets mounted on the lecture theatre computer), but how would I move Cabal-installed non-core packages (not to mention Cabal itself?)? Is that use case documented in some faq?
Ok, so you want to construct a set of relocatable packages. This needs to be decided from the beginning when you compile said packages because otherwise packages can have paths baked into them. There are some restrictions on making relocatable packages, eg you can't set --libdir to an absolute path, it has to be relative to the --prefix. In addition to making the package relocatable, we would have to register the package into a package db that lives relative to the packages in question. This db would contain relative paths (using ${pkgroot}). Once this is done then the whole lot would be relocatable onto a USB drive or whatever. To use this set of packages you would need to specify --package-conf= to ghc, or --package-db= to cabal.
If the extra package paths are absolute, it would involve something like search&replace on the concrete representation of the supposedly abstract package database, but as long as that representation is a simple text file, that might not be too troublesome;
Aye, so if you want to be able to move then then it's better if they're relative.
if the extra package paths are relative to a $cabaltopdir, it would involve telling GHC about the new location prefix whenever calling it directly (or telling Cabal about its new location, and Cabal passing that on when calling GHC).
So that's the bit in your suggestion that corresponds to using --package-conf= in my suggestion. And it assumes that you don't need to set $cabaltopdir to two values simultaniously, eg if the machine you've moved it to on the USB stick also has cabal packages that it needs to use.
It's true that the GHC build system cannot work in a directory containing spaces, and that's probably too hard to fix. However using tools (eg happy, alex) that are in a dir containing spaces should not be nearly so hard to fix.
Maybe so, but last time (end of January) I asked about the GHC build (in a space-free path) using tools where cabal installs them by default (with spaces in path), Simon M answered: "It's not practical in general to cope with spaces in paths in the build system. IIRC we tried to get this right once and gave up.". So if there is a tool path specific subset of the problem that could be solved more easily, it doesn't seem to help.
Spaces in paths in general is indeed hard. The case where the build tree is in a path with no spaces but some of the external tools are is much easier. Simon was talking about the more general, harder case.
Is that even possible on unix systems, with their various packaging and location traditions? I'm not sure what you're referring to.
Some unix branches seem to distinguish themselves merely by different package management/location. But apart from Mac frameworks, I'm not aware of any unix that would not expect libraries/binaries/docs to be installed in different locations (instead of common, language-specific roots, the common roots are purpose-specific).
Relocation might not be as typical there, though (don't know how mounting external drives affects paths there, or whether users might want to relocate packages between different prefixes like /opt, /etc, /usr, .., or just from user-local to system-wide locations).
It's not especially common after installation. The advantage of a prefix-independent binary install is that it makes installation easy, just a copy. Duncan

If you don't want to move from absolute paths for non-core packages, the current system should just work, right?
Yes.
The current system being the $topdir one.
Though it also allows for the possibility of relocatable sets of packages that are not installed relative to the compiler. But more importantly it's more general and simpler than the current '$topdir' that ghc uses.
'it' now being the new system evolving in this thread, or have I missed anything?
(a) making ghc-pkg (optionally) instantiate any variables in its database in (all of) its command-line output and
Yes, though I'm only asking for two vars (previously one), not an ad-hoc set of vars.
(b) allowing non-core packages to be relocated without having to update ghc-pkg's database.
In my suggested system this is possible if that set of packages use their own package db (containing relative paths).
That is news to me - was that specified before this thread moved to ghc-users?
In your system it's possible by updating some var in a central registry and having that set of packages use paths relative to that var.
So, essentially, your system would have to keep a file listing the various package.conf locations (currently, GHC only knows about two: system/user, everything else would have to be passed on the commandline..). While my system would have to keep a file listing the variable bindings, so that tools processing the package db can instantiate the variables. I could see both approaches being useful, even together.
So ghc's current system uses two vars, $topdir and $httptopdir.
This is GHC's view of its database. It should be useable independently, via ghc-pkg and ghc api clients (such as GHC, GHCi, Haddock, ..) - all of which should be able to resolve the variable bindings, in the same way. Btw, it would really be nice if the package handling code was shared rather than duplicated.
I'm proposing to replace those with a standardised ${pkgroot} and ${pkgrooturl} vars which are usable by all compilers and in more situations.
Now you are talking about Cabal's view of its database. It doesn't have to expose the underlying implementation's view, especially since the other implementations organise their package handling differently. And why just two variables? Is $pkgroot about .hi files, .a/.so./.dll files, or about include files, or haddock indices, or ..? In windows, these tend to end in a common sub-hierarchy, but you're aiming for something general, right?
You're proposing a central registry of vars and to have ghc-pkg (optionally) expand these vars which could be used anywhere in the installed package descriptions. Presumably you're also suggesting some mechanism to query and update this registry of variables.
Is that a fair summary?
I think so. And you're proposing several separate registries (hasn't that been a Cabal problem in the past, even with just user and system to choose from?). Presumably you're also suggesting some mechanism to query and update the meta-registry of package database locations. Claus

On Thu, 2009-05-28 at 23:40 +0100, Claus Reinke wrote:
If you don't want to move from absolute paths for non-core packages, the current system should just work, right?
Yes.
The current system being the $topdir one.
Yep. It works, it's just not nice, it's ghc-specific and only make sense when ghc is installed in a prefix-independent way.
Though it also allows for the possibility of relocatable sets of packages that are not installed relative to the compiler. But more importantly it's more general and simpler than the current '$topdir' that ghc uses.
'it' now being the new system evolving in this thread, or have I missed anything?
The new system I've been proposing.
(a) making ghc-pkg (optionally) instantiate any variables in its database in (all of) its command-line output and
Yes, though I'm only asking for two vars (previously one), not an ad-hoc set of vars.
(b) allowing non-core packages to be relocated without having to update ghc-pkg's database.
In my suggested system this is possible if that set of packages use their own package db (containing relative paths).
That is news to me - was that specified before this thread moved to ghc-users?
It was in the first email that was cc'ed to ghc-users: How about this: a way to specify paths in the package registration info that are relative to the location of the package db they are in. That makes sense beyond just ghc and even with would allow other sets of relocatable packages, not just those installed with ghc.
In your system it's possible by updating some var in a central registry and having that set of packages use paths relative to that var.
So, essentially, your system would have to keep a file listing the various package.conf locations (currently, GHC only knows about two: system/user, everything else would have to be passed on the commandline..). While my system would have to keep a file listing the variable bindings, so that tools processing the package db can instantiate the variables.
If you want multiple relocatable sets of packages that are immediately "available" in the environment.
I could see both approaches being useful, even together.
So ghc's current system uses two vars, $topdir and $httptopdir.
This is GHC's view of its database. It should be useable independently, via ghc-pkg and ghc api clients (such as GHC, GHCi, Haddock, ..) - all of which should be able to resolve the variable bindings, in the same way.
It's not usable independently, ghc does not always have a topdir. This makes life hard for tools. It's also not clear what topdir would mean in the context of other compilers.
Btw, it would really be nice if the package handling code was shared rather than duplicated.
It would be nice, yes.
I'm proposing to replace those with a standardised ${pkgroot} and ${pkgrooturl} vars which are usable by all compilers and in more situations.
Now you are talking about Cabal's view of its database.
Cabal does not own the package databases, however it does expect that they are in the format describe by the Cabal spec, which places obligations on Haskell implementations to be somewhat package-aware.
It doesn't have to expose the underlying implementation's view, especially since the other implementations organise their package handling differently.
All compilers use the same information (it's in the Cabal spec). They do store it differently but they all identify the location of the information using a file path. That seems pretty universal, compared to $topdir.
And why just two variables? Is $pkgroot about .hi files, .a/.so./.dll files, or about include files, or haddock indices, or ..?
You only need one variable to identify the location of the installed package description. All relative paths can be constructed from that. The second variable is to allow for two representations of the same location, one as a native system path, the other as a URL. We do not need different variables for different kinds of files (except in as much as some fields use paths and some urls).
In windows, these tend to end in a common sub-hierarchy, but you're aiming for something general, right?
If you're making a relocatable package then these files will be in a common sub-hierarchy and you would use relative paths. If you're not making a relocatable package (eg following the Linux FSH) then you would not use relative paths. So that should be general. It does not remove any existing capability and it adds the ability to have relative paths for relocatable packages. Perhaps what you're saying is that we should be able to take any package whether it lives in a common sub-hierarchy or not and relocate it. In general this is problematic since packages can embed paths and if those paths are not relative to a common root then you have to specify them all (Cabal enables this by setting environment variables). Assuming that's ok, then even this rare use case is still possible, just by editing the package registration information. It doesn't need to be "simplified" by having one var per package entry.
You're proposing a central registry of vars and to have ghc-pkg (optionally) expand these vars which could be used anywhere in the installed package descriptions. Presumably you're also suggesting some mechanism to query and update this registry of variables.
Is that a fair summary?
I think so. And you're proposing several separate registries (hasn't that been a Cabal problem in the past, even with just user and system to choose from?).
Cabal can be instructed to use a specific package db, not just global and user.
Presumably you're also suggesting some mechanism to query and update the meta-registry of package database locations.
I wasn't actually proposing a meta-registry (ghc already supports this via an environment variable) but it is a possible extension which brings it closer to the capabilities of your proposal. My proposal is just to replace the use of $topdir with something that every compiler can implement and which tools can understand. The fact that it could be extended without having to modify the installed package description format or the tools which understand that format is a bonus. I would not argue particularly for or against such an extension, my main concern is for a simple clear spec and the sanity of tool authors. Duncan

On Wed, 2009-05-27 at 21:17 +0100, Duncan Coutts wrote:
To be clear, here's what I'm imagining:
blah/package.conf blah/lib/foo-1.0/libfoo-1.0.a
and package.conf would contain foo-1.0 with paths looking like "$dbdir/lib/foo-1.0". That is, we interpret $dbdir (or whatever var name we agree on) as being "blah/" because that's the dir containing the db.
Ian has convinced me that we do actually need two vars. We need one for file paths and one for urls. For example, consider haddock-html: $pkgroot/doc/ghc-6.10.1/libraries/base This is supposed to expand to a URL like file:///usr/share/doc/ghc-6.10.1/libraries/base or something similar on Windows. It's especially important that it is a file:// url on windows because normal windows paths are not absolute urls like unix ones are. Now if we've only got one var like $pkgroot then we cannot encode file:///usr/share/doc/ghc-6.10.1/libraries/base and also be able to interpret it relative for tools that grok the var. You could say: haddock-html: file://$pkgroot/doc/ghc-6.10.1/libraries/base but then tools that want to construct relative paths have to disentangle the file:// prefix. So, we suggest that we have two vars, $pkgroot and $pkgrooturl. These are to be interpreted as the directory containing the package registration information, eg the package.conf file in the case of ghc. Hugs and nhc do not use package databases but they do use individual files for each package's registration info. So again $pkgroot(url) is just the dir containing the file. Now tools may well be expected to understand these vars. We cannot always have hc-pkg expand it, because for hugs and nhc there is no hc-pkg, it's just the simple text files. A tool using the Cabal lib to read the set of installed packages could benefit from the var expansion but not one reading the files directly. As a convenience ghc-pkg field does do variable expansion, and that's probably the right tradeoff. Tools that parse the output of ghc-pkg dump/describe can be expected to do the var expansion (and of course they may want to see and construct relative paths). The only thing is that tools then need to know the path to use for the $pkgroot. In particular for the --global and --user packages which are not specified to hc-pkg by their path. Why not continue to use $topdir and $httptopdir? Because these things are not guaranteed to exist. They only make sense for a relocatable compiler installation. Users (especially distro packagers) may choose to do non-relocatable installations following the FSH spec. However the package file/db itself always exists. Also it's more general, it allows multiple relocatable sets of packages, each with their own package file/db, where as $topdir is tied to the installation of the compiler itself. Duncan
participants (2)
-
Claus Reinke
-
Duncan Coutts