
Dear all, I *finally* had some time again to look at how to properly redo Uniques in GHC. A few things stood out to me: - The export-list of Unique has some comments stating that function X is only exported for module Y, yet is used elsewhere. This may be because these comments do not show up in haddock etc. leading some people to think they're up for general use. In my refactoring, I'm sticking the restriction in the function name, so it's no longer mkUniqueGrimily, but rather mkUniqueOnlyForUniqSupply (making the name even longer should discourage their use more). If at all possible, these should be removed altogether asap. - The implementation is based on FastInts, which, on most machines nowadays, is a 64-bit thing. The serialisation in BinIface is explicitly based on Word32s. Aside from the obvious potential (albeit with a low probability) for errors, this lead me to wonder about 32/64-bitness. Is there a reason for 64-bit versions of GHC to write Word32s, or is this a historic thing? Must the interface-files be bit-compatible between different versions (32/64-bits) of the compiler? Lastly, is the choice of whether "this" is a 32 or 64-bit version completely determined by WORD_SIZE_IN_BITS (MachDeps.h)? - There are a few libraries that are explicitly dependent on GHC, yet have their own versions of Unique. So far, I've seen this for Hoopl and Template Haskell. They have unboxed ints to do the job. I have not researched whether they manipulate them in any way, or just store things. If the former; I would like to call to reconsider this, because it seems like a poor separation of concerns toe me. If the latter, I think refactoring them to use Unique instead of Int# should be straightforward. ? The point of refactoring Unique is to no longer have low-level optimisations of manually unpacking (and repacking using the MkUnique constructor), which should serve as a lovely test of how far the optimisations have come. Furthermore, it seemed that the use of characters to encode the domain was somewhat awkward, but I refer anyone interested to earlier posts on this list. Thoughts? Comments? Suggestions? Objections? Regards, Philip

On Mon, Oct 6, 2014 at 11:58 AM,
- The export-list of Unique has some comments stating that function X is only exported for module Y, yet is used elsewhere. This may be because these comments do not show up in haddock etc. leading some people to think they're up for general use. In my refactoring, I'm sticking the restriction in the function name, so it's no longer mkUniqueGrimily, but rather mkUniqueOnlyForUniqSupply (making the name even longer should discourage their use more). If at all possible, these should be removed altogether asap.
Since you're touching this code base it would be a terrific time to add some Haddocks! (We recently decided, on the ghc-devs@ list, that all new top-level entities, i.e. functions, data types, and classes, should have Haddocks.)

Very much part of my plan, Johan! I was a fervent "+1" on that recommendation.
Ph.
?
________________________________
From: Johan Tibell

Hi, Am Montag, den 06.10.2014, 09:58 +0000 schrieb p.k.f.holzenspies@utwente.nl:
- The implementation is based on FastInts, which, on most machines nowadays, is a 64-bit thing. The serialisation in BinIface is explicitly based on Word32s. Aside from the obvious potential (albeit with a low probability) for errors, this lead me to wonder about 32/64-bitness. Is there a reason for 64-bit versions of GHC to write Word32s, or is this a historic thing? Must the interface-files be bit-compatible between different versions (32/64-bits) of the compiler? Lastly, is the choice of whether "this" is a 32 or 64-bit version completely determined by WORD_SIZE_IN_BITS (MachDeps.h)?
A while ago we had problems with haddock in Debian when the serialization became bit-dependent.¹ I suggest to keep the specification of any on-disk format independent of architecture specifics. Greetings, Joachim ¹ http://bugs.debian.org/586723#15 -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org

Dear Joachim,
Although I can't quite get what you're saying from the posts on that link, I'm not immediately sure what you're saying should extend to hi-files. These files are very much specific to the compiler version you're using, as in, new GHCs add stuff to them all the time and their binary format does not (seem to) provision for being able to skip unknown things (i.e. it doesn't say how large the next semantic block is in the hi-file).
If we're going to keep the formats the same for any architecture, we're going to have to limit 64-bit machines to 32-bit (actually 30-bits, another thing I don't quite understand in BinIface) Uniques. There seem to be possibilities to alleviate the issues with parallel generation of fresh Uniques in a parallel version of GHC. The idea is that, since 64-bits is more than we'll ever assign anyway, to use a few for thread-ids, so we would guarantee non-conflicting Uniques generated by different threads.
Anyway, maybe someone a tad more knowledgeable about Uniques could maybe tell me on what scale Uniques in the hi-files should be unique? Must they only be non-conflicting in a Module? In a Package? If I first compile a file with GHC and then, in a separate invocation of GHC, compile another, surely their hi-files will have some of the same Uniques for their own, different things? Where are these conflicts resolved when linking multiple independently compiled files? Are they ever?
Regards,
Philip
?
________________________________
From: Joachim Breitner

Hi, Am Montag, den 06.10.2014, 19:55 +0000 schrieb p.k.f.holzenspies@utwente.nl:
Although I can't quite get what you're saying from the posts on that link, I'm not immediately sure what you're saying should extend to hi-files. These files are very much specific to the compiler version you're using, as in, new GHCs add stuff to them all the time and their binary format does not (seem to) provision for being able to skip unknown things (i.e. it doesn't say how large the next semantic block is in the hi-file).
Some of this may not be true, but to my knowledge (part of) that interface reading code is (or was?) used by haddock when generating its .haddock file.
If we're going to keep the formats the same for any architecture, we're going to have to limit 64-bit machines to 32-bit (actually 30-bits, another thing I don't quite understand in BinIface) Uniques.
Why? You can just serialize Uniques always as 64 bit numbers, even on 32-bit systems. This way, the data format is the same across architectures, with little cost.
There seem to be possibilities to alleviate the issues with parallel generation of fresh Uniques in a parallel version of GHC. The idea is that, since 64-bits is more than we'll ever assign anyway, to use a few for thread-ids, so we would guarantee non-conflicting Uniques generated by different threads.
But that would only work on 64 bit systems, right? Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org

Dear Joachim,
Some of this may not be true, but to my knowledge (part of) that interface reading code is (or was?) used by haddock when generating its .haddock file.
Ah, well, I didn't know this. How does haddock use this code? If Haddock uses the GHC-API to do this; problem solved, because we're back at the specific compiler version that generated it. Otherwise... we may be in trouble.
Why? You can just serialize Uniques always as 64 bit numbers, even on 32-bit systems. This way, the data format is the same across architectures, with little cost.
Ah, the cost is this; if we start relying on the 64-bitness for uniqueness (which may well happen; there are categories - currently characters - used only for four compile-time known Uniques, waisting 30 - 8 - 2 = 20 bits), this will break the 32-bit compilers. Arguably, their breakage should reject the change leading to the waisted Uniques. Seems a risk, though. Similarly to how currently Uniques are 64-bits, but serialised as 30. Alternatively, 32-bit GHC could use 64-bit Uniques, but that is going to give you quite the performance hit (speculating here).
But that would only work on 64 bit systems, right?
Yes, this approach to a parallel GHC would only work on 64-bit machines. The idea is, I guess, that we're not going to see a massive demand for parallel GHC running on multi-core 32-bit systems. In other words; 32-bit systems wouldn't get a parallel GHC. Regards, Philip

On 10/07/2014 02:32 AM, p.k.f.holzenspies@utwente.nl wrote:
But that would only work on 64 bit systems, right?
Yes, this approach to a parallel GHC would only work on 64-bit machines. The idea is, I guess, that we're not going to see a massive demand for parallel GHC running on multi-core 32-bit systems. In other words; 32-bit systems wouldn't get a parallel GHC.
On ARM, 32-bit quad-cores are common. But maybe no one will care to run GHC on (32-bit) ARM.

On Tue, Oct 7, 2014 at 1:32 AM,
Yes, this approach to a parallel GHC would only work on 64-bit machines. The idea is, I guess, that we're not going to see a massive demand for parallel GHC running on multi-core 32-bit systems. In other words; 32-bit systems wouldn't get a parallel GHC.
Let me make sure I'm understanding this correctly: in this particular proposed solution, the side effect would be that we no longer have a capable 32bit runtime which supports multicore parallelism? Sorry, but I'm afraid this approach is pretty much unacceptable IMO, for precisely the reason outlined in your last sentence. 32bit systems are surprisingly commen. I have several multicore 32bit ARMv7 machines on my desk right now, for example. And there are a lot more of those floating around than you might think. If that's the 'cure', I think I (and other users) would consider it far worse than the disease.
Regards, Philip
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Wait, wait, wait! I wasn't talking about a parallel *runtime*. Nothing changes there. All I'm talking about is something that is a very old issue that never got added / solved / resolved. Somewhere on the commentary, or the mailing list, I seem to recall that the generation of Uniques was the bottleneck for the parallelisation of GHC *Itself*. It's about having a compiler using multiple threads and says nothing about programs coming out of it.
I'm all with you on embedded processors and that kind of stuff, but I don't see a pressing need to compile *on* them. Isn't all ARM-stuff assuming cross-compilation?
Ph.
________________________________________
From: mad.one@gmail.com
Yes, this approach to a parallel GHC would only work on 64-bit machines. The idea is, I guess, that we're not going to see a massive demand for parallel GHC running on multi-core 32-bit systems. In other words; 32-bit systems wouldn't get a parallel GHC.
Let me make sure I'm understanding this correctly: in this particular proposed solution, the side effect would be that we no longer have a capable 32bit runtime which supports multicore parallelism? Sorry, but I'm afraid this approach is pretty much unacceptable IMO, for precisely the reason outlined in your last sentence. 32bit systems are surprisingly commen. I have several multicore 32bit ARMv7 machines on my desk right now, for example. And there are a lot more of those floating around than you might think. If that's the 'cure', I think I (and other users) would consider it far worse than the disease.
Regards, Philip
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

On Tue, Oct 7, 2014 at 11:26 AM,
Wait, wait, wait! I wasn't talking about a parallel *runtime*. Nothing changes there. All I'm talking about is something that is a very old issue that never got added / solved / resolved. Somewhere on the commentary, or the mailing list, I seem to recall that the generation of Uniques was the bottleneck for the parallelisation of GHC *Itself*. It's about having a compiler using multiple threads and says nothing about programs coming out of it.
OK, cool. Just making sure. :)
I'm all with you on embedded processors and that kind of stuff, but I don't see a pressing need to compile *on* them. Isn't all ARM-stuff assuming cross-compilation?
No, not all ARM builds assume cross compilation. In fact, if you want a fully working GHC, cross compilation is impossible: you cannot cross compile GHCi, meaning you can't use Template Haskell (as well as some of the linker features, I believe). However, I don't think this change impacts building GHC at all, since we get parallelism through 'make', not through GHC itself (and on low-end systems, parallelism in the build system is crucial and really necessary.) So I assume your change would mean 'ghc -j' would not work for 32bit. I still consider this a big limitation, one which is only due to an implementation detail. But we need to confirm this will actually fix any bottlenecks first though before getting to that point.
Ph.
________________________________________ From: mad.one@gmail.com
on behalf of Austin Seipp Sent: 07 October 2014 17:46 To: Holzenspies, P.K.F. (EWI) Cc: ghc-devs@haskell.org Subject: Re: Again: Uniques in GHC On Tue, Oct 7, 2014 at 1:32 AM,
wrote: Yes, this approach to a parallel GHC would only work on 64-bit machines. The idea is, I guess, that we're not going to see a massive demand for parallel GHC running on multi-core 32-bit systems. In other words; 32-bit systems wouldn't get a parallel GHC.
Let me make sure I'm understanding this correctly: in this particular proposed solution, the side effect would be that we no longer have a capable 32bit runtime which supports multicore parallelism?
Sorry, but I'm afraid this approach is pretty much unacceptable IMO, for precisely the reason outlined in your last sentence. 32bit systems are surprisingly commen. I have several multicore 32bit ARMv7 machines on my desk right now, for example. And there are a lot more of those floating around than you might think.
If that's the 'cure', I think I (and other users) would consider it far worse than the disease.
Regards, Philip
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards,
Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/
-- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

On Tue, Oct 7, 2014 at 11:40 AM, Austin Seipp
I still consider this a big limitation, one which is only due to an implementation detail.
To make this point more clear: I'm generally hesitant about basing the availability of architecture-independent features ('ghc supporting -j', which almost entirely was fixed in the frontend compiler pipeline) on platform details, or the underlying hardware or execution model GHC runs on. Unless the changes are very significant (like, -j gets significantly faster), I think we should be very hesitant about gating frontend features behind architectural/implementation details. -- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

________________________________________
From: mad.one@gmail.com

in some respects, having fully deterministic builds is a very important goal: a lot of tooling for eg, caching builds of libraries works much much better if you have that property :) On Tue, Oct 7, 2014 at 12:45 PM,
________________________________________ From: mad.one@gmail.com
on behalf of Austin Seipp < austin@well-typed.com> So I assume your change would mean 'ghc -j' would not work for 32bit. I still consider this a big limitation, one which is only due to an implementation detail. But we need to confirm this will actually fix any bottlenecks first though before getting to that point.
Yes, that's what I'm saying.
Let me just add that what I'm proposing by no means prohibits or hinders making 32-bit GHC-versions be parallel later on, it just doesn't solve the problem. It depends to what extent the "fully deterministic behaviour" bug is considered a priority (there was something about parts of the hi-files being non-deterministic across different executions of GHC; don't recall the details).
Anyhow, the work I'm doing now exposes a few things about Uniques that confuse me a little and that could have been bugs (that maybe never acted up). Extended e-mail to follow later on.
Ph. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Dear Carter, Simon, et al,
(CC'd SPJ on this explicitly, because I *think* he'll be most knowledgeable on some of the constraints that need to be guaranteed for Uniques)
I agree, but to that end, a few parameters need to become clear. To this end, I've created a Phabricator-thing that we can discuss things off of:
https://phabricator.haskell.org/D323
Here are my open issues:
- There were ad hoc domains of Uniques being created everywhere in the compiler (i.e. characters chosen to classify the generated Uniques). I have gathered them all up and given them names as constructors in Unique.UniqueDomain. Some of these names are arbitrary, because I don't know what they're for precisely. I generally went for the module name as a starting point. I did, however, make a point of having different invocations of mkSplitUniqSupply et al all have different constructors (e.g. HscMainA through HscMainC). This is to prevent the high potential for conflicts (see comments in uniqueDomainChar). If there are people that are more knowledgeable about the use of Uniques in these modules (e.g. HscMain, ByteCodeGen, etc.) can say that the uniques coming from these different invocations can never cause conflict, they maybe can reduce the number of UniqueDomains.
- Some UniqueDomains only have a handful of instances and seem a bit wasteful.
- Uniques were represented by a custom-boxed Int#, but serialised as Word32. Most modern machines see Int# as a 64-bit thing. Aren't we worried about the potential for undetected overlap/conflict there?
- What is the scope in which a Unique must be Unique? I.e. what if independently compiled modules have overlapping Uniques (for different Ids) in their hi-files? Also, do TyCons and DataCons really need to have guaranteed different Uniques? Shouldn't the parser/renamer figure out what goes where and raise errors on domain violations?
- There seem to be related-but-different Unique implementations in Template Haskell and Hoopl. Why is this?
- How critical is it to let mkUnique (and mkSplitUniqSupply) be pure functions? If they can be IO, we could greatly simplify the management of (un)generated Uniques in each UniqueDomain and quite possibly make the move to a threaded GHC easier (for what that's worth). Also, this may help solve the non-determinism issues.
- Missing haddocks, failing lints (lines too long) and a lot of cosmetics will be met when the above points have become a tad more clear. I'm more than happy to document a lot of the answers to the above stuff in Unique and/or commentary.
Regards,
Philip
________________________________
From: Carter Schonwald

One of the things I'm finding difficult about this Phab stuff is that I get presented with lots of code without enough supporting text saying
* What problem is this patch trying to solve?
* What is the user-visible design (for language features)?
* What are the main ideas in the implementation?
The place we usually put such design documents is on the GHC Trac Wiki. Email is ok for discussion, but the wiki is FAR better for stating clearly the current state of play. Philip, might you make such a page for this unique stuff?
To answer some of you specific questions (please include the answers in the wiki page in some form):
* Uniques are never put in .hi files (as far as I know). They do not survive a single invocation of GHC.
* However with ghc --make, or ghci, uniques do survive for the entire invocation of GHC. For example in ghc --make, uniques assigned when compiling module A should not clash with those for module B
* Yes, TyCons and DataCons must have separate uniques. We often form sets of Names, which contain both TyCons and DataCons. Let's not mess with this.
* Having unique-supply-splitting as a pure function is so deeply embedded in GHC that I could not hazard a guess as to how difficult it would be to IO-ify it. Moreover, I would regret doing so because it would force sequentiality where none is needed.
* Template Haskell is a completely independent Haskell library. It does not import GHC. If uniques were in their own package, then TH and GHC could share them. Ditto Hoopl.
* You say that Uniques are serialised as Word32. I'm not sure why they are serialised at all!
* Enforcing determinacy everywhere is a heavy burden. Instead I suppose that you could run a pass at the end to give everything a more determinate name TidyPgm does this for the name strings, so it would probably be easy to do so for the uniques too.
Simon
________________________________
From: ghc-devs [ghc-devs-bounces@haskell.org] on behalf of p.k.f.holzenspies@utwente.nl [p.k.f.holzenspies@utwente.nl]
Sent: 07 October 2014 22:03
To: carter.schonwald@gmail.com
Cc: ghc-devs@haskell.org
Subject: RE: Again: Uniques in GHC
Dear Carter, Simon, et al,
(CC'd SPJ on this explicitly, because I *think* he'll be most knowledgeable on some of the constraints that need to be guaranteed for Uniques)
I agree, but to that end, a few parameters need to become clear. To this end, I've created a Phabricator-thing that we can discuss things off of:
https://phabricator.haskell.org/D323
Here are my open issues:
- There were ad hoc domains of Uniques being created everywhere in the compiler (i.e. characters chosen to classify the generated Uniques). I have gathered them all up and given them names as constructors in Unique.UniqueDomain. Some of these names are arbitrary, because I don't know what they're for precisely. I generally went for the module name as a starting point. I did, however, make a point of having different invocations of mkSplitUniqSupply et al all have different constructors (e.g. HscMainA through HscMainC). This is to prevent the high potential for conflicts (see comments in uniqueDomainChar). If there are people that are more knowledgeable about the use of Uniques in these modules (e.g. HscMain, ByteCodeGen, etc.) can say that the uniques coming from these different invocations can never cause conflict, they maybe can reduce the number of UniqueDomains.
- Some UniqueDomains only have a handful of instances and seem a bit wasteful.
- Uniques were represented by a custom-boxed Int#, but serialised as Word32. Most modern machines see Int# as a 64-bit thing. Aren't we worried about the potential for undetected overlap/conflict there?
- What is the scope in which a Unique must be Unique? I.e. what if independently compiled modules have overlapping Uniques (for different Ids) in their hi-files? Also, do TyCons and DataCons really need to have guaranteed different Uniques? Shouldn't the parser/renamer figure out what goes where and raise errors on domain violations?
- There seem to be related-but-different Unique implementations in Template Haskell and Hoopl. Why is this?
- How critical is it to let mkUnique (and mkSplitUniqSupply) be pure functions? If they can be IO, we could greatly simplify the management of (un)generated Uniques in each UniqueDomain and quite possibly make the move to a threaded GHC easier (for what that's worth). Also, this may help solve the non-determinism issues.
- Missing haddocks, failing lints (lines too long) and a lot of cosmetics will be met when the above points have become a tad more clear. I'm more than happy to document a lot of the answers to the above stuff in Unique and/or commentary.
Regards,
Philip
________________________________
From: Carter Schonwald

Dear Simon, et al,
I've created the wiki-page about the Unique-patch [1].
Should it be linked to from the KeyDataTypes [2]?
Regards,
Philip
[1] https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/Unique
[2] https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/KeyDataTypes
________________________________
From: Simon Peyton Jones

Thank you. A most helpful beginning. I have added some comments and queries, as well as clarifying some points.
Simon
________________________________
From: p.k.f.holzenspies@utwente.nl [p.k.f.holzenspies@utwente.nl]
Sent: 09 October 2014 12:39
To: Simon Peyton Jones; carter.schonwald@gmail.com
Cc: ghc-devs@haskell.org
Subject: RE: Again: Uniques in GHC
Dear Simon, et al,
I've created the wiki-page about the Unique-patch [1].
Should it be linked to from the KeyDataTypes [2]?
Regards,
Philip
[1] https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/Unique
[2] https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/KeyDataTypes
________________________________
From: Simon Peyton Jones

Some of those clarifying points helped a *great* deal. Thanks.
I've addressed comments / questions and linked from KeyTypes.
Ph.
________________________________
From: Simon Peyton Jones
participants (7)
-
Austin Seipp
-
Carter Schonwald
-
Isaac Dupree
-
Joachim Breitner
-
Johan Tibell
-
p.k.f.holzenspies@utwente.nl
-
Simon Peyton Jones