
Hi all, I've been doing some work with the GHC API, and I'm kind of confused as to how Uniques are supposed to work. I've found that each Name has a Unique that is its primary identification (at least for internal names). That means that two identifiers are different iff their Uniques are different, even when their OccNames are the same. This also means that two identifiers are equal when their Uniques are equal, even if their OccNames are different (though I think this situations is not supposed to exist and is really undefined?). Now, from the name "Unique", one would assume that each different identifier in a compiler run gets a different Unique, even when they are in different functions or modules. However, I've found some evidence to suggest otherwise. In particular, I've been looking at applying substitutions using CoreSubst. As a part of a substitution, the set of identifiers that are in scope are included with the substitutions to perform. I don't completely understand the requirements on this InScopeSet, but I do see that any identifiers that are bound in the to-be-substituted-in expression that are also in scope, will have their Unique changed. This makes sense when you want Uniques to be really unique, since \a_1 -> \a_1 -> \a_1 will then become \a_1 -> \a_2 -> \a_2. This does not sound related to substitution, but apparantly substition always performs this de-shadowing (as I think it's called?) implicitely. Giving this some further thought, it makes some sense, since when I'm performing the substituion \x_3 -> a_4 (which has a free variable a_4), on the expression \a_4 -> x_3, this must of course result in \a_5 -> a_4, not \a_4 -> a_4. So, it seems the InScopeSet should at least contain the free variables of the substited values. Looking a bit closer, this might be exactly what the definition at CoreSubst says: "Precisely, the in-scope set must be a superset of the free vars of the substitution range that might possibly clash with locally-bound variables in the thing being substituted in." I'm not completely sure what the "substituion range" is, but I can imagine this being the expression part of a binder -> expression substitution? Anyway, back to my original point. The above suggests that Uniques are not so unique after all. If every binder everywhere would get its own Unique, the Unique replacement CoreSubst does would never be neccesary. So, how is the uniqueness of Unique really defined? I can imagine that the generation of new Uniques is avoided as much as possible for performance reasons, and thus uniqueness is guaranteed only "on demand", by running deshadowing / substitutions when things could be clashing? Any thoughts on what kind of non-uniqueness is allowed and what is not? Are there common pitfalls when working with uniques? When to apply deshadowing or other uniqueness generating techniques? On a related note, is my understanding correct that deshadowing will only provide uniqueness on each "path" through the expression tree, and not over the entire tree? i.e., the expression let x_6 = \y_7 -> y_7; z_8 = \y_7 -> \y_7 in x_6 is a deshadowed expressions, even though both of the y_7 lambda binders refer to a different variable? If so, is there any function that gives complete uniqueness over an entire tree? Gr. Matthijs

Matthis | Now, from the name "Unique", one would assume that each different identifier | in a compiler run gets a different Unique, even when they are in different | functions or modules. However, I've found some evidence to suggest otherwise. Indeed, that's not so. The thing to read is Section 4 of: http://research.microsoft.com/en-us/um/people/simonpj/papers/inlining/index.... That tells you all about the InScopeSet etc. I hope that helps. If you are still stuck after reading that, ask again. Would you like to record the answers you find somewhere in the GHC Commentary? http://hackage.haskell.org/trac/ghc/wiki/Commentary I imagine you have been looking there for clues anyway? It's a wiki, so you can improve it! thanks Simon

The thing to read is Section 4 of: http://research.microsoft.com/en-us/um/people/simonpj/papers/inlining/index.... That tells you all about the InScopeSet etc. From a first glance, it looks like you mean section 5 :-) I haven't read the
Hi Simon, paper fully yet, it's in the printer now :-)
I hope that helps. If you are still stuck after reading that, ask again. It seems like it will answer quite some questions (and its also looks like an interesting read in itself, probably even useful to cite in my thesis :-)
However, one thing will probably not be answered there, since it is more of an implementation issue: Is there any function that will guarantee all Uniques in a CoreExpr will be completely unique? This is stronger than de-shadowing, since I also want Uniques in completely different branches of the expression to be different. I need this, since I want to keep a global table with a mapping from a Unique to some values (mostly related to naming). I might be able to pull this off with some hard thinking and non-unique Uniques, but it would help a lot if I didn't need to. If there isn't, I'll probably write something up (shouldn't be too hard I guess). Would there be any interest to include this in the GHC API? Or is the GHC API limited to what GHC itself uses?
Would you like to record the answers you find somewhere in the GHC Commentary? http://hackage.haskell.org/trac/ghc/wiki/Commentary I'll see if I can write something coherent :-p. I think there might be quite a lot more I could put up there, though as always I'd need to find the time :-)
Gr. Matthijs

| However, one thing will probably not be answered there, since it is more of an | implementation issue: Is there any function that will guarantee all Uniques in | a CoreExpr will be completely unique? This is stronger than de-shadowing, | since I also want Uniques in completely different branches of the expression | to be different. Not at the moment. | I need this, since I want to keep a global table with a mapping from a Unique | to some values (mostly related to naming). I might be able to pull this off | with some hard thinking and non-unique Uniques, but it would help a lot if I | didn't need to. | | If there isn't, I'll probably write something up (shouldn't be too hard I | guess). Would there be any interest to include this in the GHC API? Or is the | GHC API limited to what GHC itself uses? I think the best way would be to make a new package that depends on the GHC package. That's what Cabal is so good for! Simon
participants (2)
-
Matthijs Kooijman
-
Simon Peyton-Jones