How unique is a Unique?

20 Jun 2009

      Hi all,

I've been doing some work with the GHC API, and I'm kind of confused as to how
Uniques are supposed to work. I've found that each Name has a Unique that is
its primary identification (at least for internal names). That means that two
identifiers are different iff their Uniques are different, even when their
OccNames are the same. This also means that two identifiers are equal when
their Uniques are equal, even if their OccNames are different (though I think
this situations is not supposed to exist and is really undefined?).

Now, from the name "Unique", one would assume that each different identifier
in a compiler run gets a different Unique, even when they are in different
functions or modules. However, I've found some evidence to suggest otherwise.

In particular, I've been looking at applying substitutions using CoreSubst.
As a part of a substitution, the set of identifiers that are in scope are
included with the substitutions to perform. I don't completely understand the
requirements on this InScopeSet, but I do see that any identifiers that are
bound in the to-be-substituted-in expression that are also in scope, will have
their Unique changed. This makes sense when you want Uniques to be really
unique, since \a_1 -> \a_1 -> \a_1 will then become \a_1 -> \a_2 -> \a_2. This
does not sound related to substitution, but apparantly substition always
performs this de-shadowing (as I think it's called?) implicitely.

Giving this some further thought, it makes some sense, since when I'm
performing the substituion \x_3 -> a_4 (which has a free variable a_4), on the
expression \a_4 -> x_3, this must of course result in \a_5 -> a_4, not
\a_4 -> a_4. So, it seems the InScopeSet should at least contain the free
variables of the substited values.

Looking a bit closer, this might be exactly what the definition at CoreSubst
says: "Precisely, the in-scope set must be a superset of the free vars of the
substitution range that might possibly clash with locally-bound variables in
the thing being substituted in." I'm not completely sure what the "substituion
range" is, but I can imagine this being the expression part of a binder ->
expression substitution?

Anyway, back to my original point. The above suggests that Uniques are not so
unique after all. If every binder everywhere would get its own Unique, the
Unique replacement CoreSubst does would never be neccesary. So, how is the
uniqueness of Unique really defined?

I can imagine that the generation of new Uniques is avoided as much as
possible for performance reasons, and thus uniqueness is guaranteed only "on
demand", by running deshadowing / substitutions when things could be clashing?
Any thoughts on what kind of non-uniqueness is allowed and what is not? Are
there common pitfalls when working with uniques? When to apply deshadowing or
other uniqueness generating techniques?

On a related note, is my understanding correct that deshadowing will only
provide uniqueness on each "path" through the expression tree, and not over
the entire tree? i.e., the expression 
let x_6 = \y_7 -> y_7; z_8 = \y_7 -> \y_7 in x_6 is a deshadowed expressions,
even though both of the y_7 lambda binders refer to a different variable? If
so, is there any function that gives complete uniqueness over an entire tree?

Gr.

Matthijs

How unique is a Unique?

Matthijs Kooijman