
Simon Marlow writes:
The only way (currently) to have such a guarantee is to use User.To.Cryp.Simons as the root of your library tree.
Does this really guarantee uniqueness?
What if I don't own a domain name personally? What if I give up my domain and somebody else buys it?
True.
What if somebody doesn't know of this convention and simply uses this exact name because he likes it?
Then libraries distributed without following the convention risk colliding with other libraries on the system, and modules in a program.
As far as I can tell, you cannot guarantee unique names at all if your name space system has an absolute root -- like Haskell does. Whatever you come up with may potentially clash with someone else's name -- and probably will. :-)
The only way to guarantee unique names -- without requiring people to abide to some "arbitrary" convention -- is the module names to be relative. If I import a module "Foo.Bar", and I am "Peter.Simons.ToolXY", then this module will be "Peter.Simons.Foo.Bar" for me. In such a system I can uniquely access any module, no matter how it's called.
There's a technical problem with this: when we compile a module we bake its module name into the symbols exported by the object code. So this approach would work fine for distributing source code, but not for binaries. One other approach is to do it the Microsoft way and give all libraries GUIDs, with the rule that you have to change the GUID when you change the library API. This would guarantee no library clashes, because a module name would include its GUID. Hmm, this might be a good idea. Suppose in GHC we give each package a GUID. You import a module by giving the package GUID and the module name. Certain packages are designated as "standard" so that you don't have to specify the GUID (eg. the Prelude and standard libraries). This way you could have multiple versions of a package installed, and GHC could provide backwards-compatibility packages so that programs written for previous versions of the libraries continue to work. Thoughts? Would this be too ugly? Cheers, Simon

Simon Marlow writes:
One other approach is to do it the Microsoft way and give all libraries GUIDs, with the rule that you have to change the GUID when you change the library API. This would guarantee no library clashes, because a module name would include its GUID.
Hmm, this might be a good idea. Suppose in GHC we give each package a GUID. You import a module by giving the package GUID and the module name.
If you use a hash of the API or of the module source code, you get GUIDs that are automatically unique, and automatically change when they are supposed to. Furthermore, you don't need to distribute them separately - anyone can compute the GUID if they have the source. Hashing the entire source is a good idea, because functionality can change even if the API doesn't (particularly, bug fixes that your code might depend upon). We wrote a paper about this, which is to appear in ICFP this year. See http://www.cl.cam.ac.uk/~kw217/research/paper-abstracts.html#Leifer*03:Globa... _Global Abstraction-Safe Marshalling with Hash Types_, James J. Leifer, Gilles Peskine, Peter Sewell, Keith Wansbrough. We're currently thinking about how to incorporate version numbers into this scheme, likely along the lines of _Modules, Abstract Types, and Distributed Versioning_, Peter Sewell, POPL 2001 (see http://www.cl.cam.ac.uk/users/pes20/). --KW 8-)

Simon Marlow writes:
If I import a module "Foo.Bar", and I am "Peter.Simons.ToolXY", then this module will be "Peter.Simons.Foo.Bar" for me. In such a system I can uniquely access any module, no matter how it's called.
There's a technical problem with this: when we compile a module we bake its module name into the symbols exported by the object code. So this approach would work fine for distributing source code, but not for binaries.
Yes, that's true. And unfortunately, there is little one can do to remedy this situation without having support in the linker or without having a pre-linking stage, which can add a prefix to the symbol names after they have been compiled. I know very little about ELF and other formats, though, so it's entirely possible that modern linkers _do_ support this already. A quick glance at the man page of GNU ld revealed the ELF fields DT_AUXILIARY and DT_FILTER, which can be set with -f and -F respectively and may provide a solution for this problem. In any case, it would be extremely nice to have this capability because it would really _solve_ the name-clash problem once and for all. I'm sure it can't be _that_ hard to do.
One other approach is to do it the Microsoft way and give all libraries GUIDs, with the rule that you have to change the GUID when you change the library API. This would guarantee no library clashes, because a module name would include its GUID.
I know next to nothing about this GUID concept, so please correct me if I'm wrong, but isn't this GUID just "yet another unique identifier convention", which may or may not clash, depending on how lucky you are? Would using a GUID really be different than using a hierarchy like Haskell does today? Peter

I know next to nothing about this GUID concept, so please correct me if I'm wrong, but isn't this GUID just "yet another unique identifier convention", which may or may not clash, depending on how lucky you are? Would using a GUID really be different than using a hierarchy like Haskell does today?
A GUID is basically a random 128-bit number; it's "just another unique identifer", but it is far less likely to clash than your compiler is likely to be affected by a stray cosmic ray. In the paper I cited, I made the following calculation: Both MD5 (RFC1321, 128-bit) and SHA-1 (RFC3174, 160-bit) are sufficiently cheap, and may be considered random functions for this application~\cite{robshaw96}. Let us consider the likelihood of collisions. For $n$ abstract types and $N$ possible hash values, the probability of a collision is approximately $n^2/2N$. Pessimistically assuming $10^{10}$~programmers in the world, writing $300$~lines of code per day with one abstract type per $100$~loc, the probability of a collision in a century of abstract types (using MD5) would then be $(10^{15})^2/2^{129} \approx 10^{-9}$. This is much less than the probability of a cosmic-ray-induced processor error in this period. (substitute "module" for "abstract type", for the Haskell context; the argument is about hashing module source, but applies to randomly-generated GUIDs equally well, as long as a good random source is used, such as Linux /dev/random, or http://www.fourmilab.ch/hotbits/generate.html). The upshot is that you really don't need to worry about collisions. --KW 8-)

Thanks for the explanation of GUIDs! So if I understood this correctly, having a GUID in every module would allow you to - import a module by name, if the name is unique, and to - import a module by (GUID,name), if the name is not unique, right? If so, this sounds _very_ good to me. :-) Peter

"Simon Marlow"
One other approach is to do it the Microsoft way and give all libraries GUIDs, with the rule that you have to change the GUID when you change the library API.
Does this really buy us anything? Isn't type checking sufficient in practice? Do we really want linking to fail if somebody has added another export to a module? Idea: Perhaps we could checksum the export types *and* (QuickCheck) invariants? A change in the properties would (often? sometimes?) mean that a bug has been fixed and is now being checked for, which could be a reason for caution. -kzm -- If I haven't seen further, it is by standing in the footprints of giants
participants (4)
-
Keith Wansbrough
-
ketil@ii.uib.no
-
Peter Simons
-
Simon Marlow