
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Great and detailed response Austin. Thank you. William, I'm happy to help in any way I can. I run SmartOS x86 and x86_64 builds of GHC HEAD on my own equipment using the GHC Builder Ian Lynagh developed: https://ghc.haskell.org/trac/ghc/wiki/Builder https://github.com/haskell/ghc-builder I'm also currently working on small tweaks to the ghc-builder and getting the GHC testsuite to pass on Illumos (and indirectly Solaris). I follow Gábor's lead on the GHC Builder priorities and Carter Schonwald acts a Pull Request gatekeeper for changes. Best, Alain On 06/18/2014 11:53 PM, Austin Seipp wrote:
Hi William,
Thanks for the email. Here're some things to consider.
For one, cross compilation is a hot topic, but it is going to be a rather large amount of work to fix and it won't be easy. The primary problem is that we need to make Template Haskell cross-compile, but in general this is nontrivial: TemplateHaskell must load and run object code on the *host* platform, but the compiler must generate code for the *target* platform. There are ways around some of these problems; for one, we could compile every module twice, once for the host, and once for the target. Upon requesting TH, the Host GHC would load Host Object Code, but the final executable would link with the Target Object Code.
There are many, many subtle points to consider if we go down this route - what happens for example if I cross compile from a 64bit machine to a 32bit one, but TemplateHaskell wants some knowledge like what "sizeOf (undefined :: CLong)" is? The host code sees a 64-bit quantity while the target actually will deal with a 32bit one. This could later explode horribly. And this isn't limited to different endianness either - it applies to the ABI in general. 64bit Linux -> 64bit Windows would be just as problematic with this exact case, as one uses LP64, while the other uses LLP64 data models.
So #1 by itself is a very, very non-trivial amount of work, and IMO I don't think it's necessary for better builds. There are other routes possible for cross compilation perhaps, but I'd speculate they are all equally as non-trivial as this one.
Finally, the remainder of the scheme, including shipping builds to remote machines and have them be tested sounds a bit more complicated, and I'm wondering what the advantages are. In particular it seems like this merely exposes more opportunities for failure points in the CI system, because now all CI depends on cross compilation working properly, being able to ship reports back and forth, and more. Depending on CC in particular is a huge burden it sounds: it makes it hard to distinguish when a cross-compilation bug may cause a failure as opposed to a changeset from a committer, which widens the scope of what we need to consider. A CI system should be absolutely as predictable as possible, and this adds a *lot* of variables to the mix. Cross compilation is really something that's not just one big task - there will be many *small* bugs laying in wait after that, the pain of a thousand cuts.
Really, we need to distinguish between two needs:
1) Continuous integration.
2) Nightly builds.
These two systems have very different needs in practice:
1) A CI system needs to be *fast*, and it needs to have dedicated resources to respond to changes quickly. This means we need to *minimize* the amount of time for developer turn around to see results. That includes minimizing the needed configurations. Shipping builds to remote machines just for CI would greatly complicate this and likely make it far longer on its own, not to mention it increases with every system we add.
2) A nightly build system is under nowhere near the same time constraints, although it also needs to be dedicated. If an ARM/Linux machine takes 6 hours to build (perhaps it's shared or something, or just really wimpy), that's totally acceptable. These can then report nightly about the results and we can reasonably blame people/changesets based on that.
Finally, both of these become more complicated by the fact GHC is a large project that has a highly variable number of configurations we have to keep under control: static, dynamic, static+dynamic, profiling, LLVM builds, builds where GHC itself is profiled, as well as the matrix of those combinations: LLVM+GHC Profiled, etc etc etc. Each of these configurations expose bugs in their own right. Unfortunately doing #1 with all these configurations would be ludicrous: it would explode the build times for any given system, and it also drastically multiplies the hardware resources we'd need for CI if we wanted them to respond quickly to any given changeset, because you not only have to *build* them, you must run them. And now you have to run a lot of them. A nightly build system is more reasonable for these problems, because taking hours and hours is expected. These problems would still be true even with cross compilation, because it multiplies the amount of work every CI run must do no matter what.
We actually already do have both of these already, too: Joachim Breitner for example has set us up a Travis-CI[1] setup, while Gabor Pali has set us up nightly builds[2]. Travis-CI does the job of fast CI, but it's not good for a few reasons:
1) We have literally zero visibility into it for reports. Essentially we only know when it explodes because Joachim yells at us (normally at me :) This is because GitHub is not our center-of-the-universe, despite how much people yearn for it to be so.
2) The time limit is unacceptable. Travis-CI for example actually cannot do dynamic builds of GHC because it takes too long. Considering GHC is shipping dynamically on major platforms now, that's quite a huge loss for a CI system to miss (and no, a separate build matrix configuration doesn't work here - GHC builds statically and dynamically at the same time, and ships both - there's no way to have "only static" and "only dynamic" entries.)
3) It has limited platform support - only recently did it have OS X, and Windows is not yet in sight. Ditto for FreeBSD. These are crucial for CI as well, as they encompass all our Tier-1 platforms. This could be fixed with cross compilation, but again, that's a big, big project.
And finally, on the GitHub note, as I said in the prior thread about Phabricator, I don't actually think it offers us anything useful at this point in time - literally almost nothing other than "other projects use GitHub", which is not an advantage, it's an appeal to popularity IMO. Webhooks still cannot do things like ban tabs, trailing whitespace, or enforce submodule integrity. We have to have our own setup for all of that. I'm never going to hit the 'Merge Button' for PRs - validation is 100% mandatory on behalf of the merger, and again, Travis-CI cannot provide coherent coverage even if we could use it for that. And because of that there's no difference between GitHub any other code site - I have to pull the branch manually and test myself, which I could do with any random git repository in the world.
The code review tools are worse than Phabricator. Finally, if we are going to accept patches from people, we need to have a coherent, singular way to do it - mixing GitHub PRs, Phabricator, and uploading patches to Trac is just a nightmare for pain, and not just for me, even though I do most of the patch work - it incurs the burden on *every* person who wants to review code to now do so in many separate places. And we need to make code review *easier*, not harder! If anything, we should be consolidating on a single place (obviously, I'd vote for Phabricator), not adding more places to make changes that we all have to keep up with, when we don't even use the service itself! That's why I proposed Phabricator: because it is coherent and a singular place to go to, and very good at what it does, and does not attempt to 'take over' GHC itself. GitHub is a fairly all-or-nothing proposition if you want any benefits it delivers, if you ask me (I say this as someone who likes GitHub for smaller projects). I just don't think their tools are suitable for us.
So, back to the topic. I think the nightly builds are actually in an OK state at the moment, since we do get reports from them, and builders do check in regularly. The nightly builders also cover a more diverse set of platforms than our CI will. But the CI and turnaround could be *greatly* improved, I think, because ghc-complete is essentially ignored or unknown by many people.
So I'll also make a suggestion: just to actually get something that will pull GHC's repo every 10 minutes or so, do a build, and then email ghc-devs *only* if failures pop up. In fact, we could just re-use the existing nightly build infrastructure for this, and just make it check very regularly, and just run standard amd64/Linux and Windows builds upon changes. I could provide hardware for this. This would increase the visibility of reports, not require *any* new code, and already works.
Overall, I will absolutely help you in every possible way, because this really is a problem for newcomers, and existing developers, when we catch dumb failures later than we should. But I think the proposed solution here is extraordinarily complex in comparison to what we actually need right now.
... I will say that if you *did* fix cross compilation however to work with TH you would be a hero to many people - myself included - continuous integration aside! :)
[1] https://github.com/nomeata/ghc-complete [2] http://haskell.inf.elte.hu/builders/
On Wed, Jun 18, 2014 at 3:10 PM, William Knop
wrote: Hello all,
I’ve seen quite a few comments on the list and elsewhere lamenting the time it takes to compile and validate ghc. It’s troublesome not only because it’s inconvenient, but, more seriously, people are holding off on sending patches in which stifles development. I would like to propose a solution:
1. Implement proper cross-compilation, such that build and host may be different— e.g. a linux x86_64 machine can build ghc that runs on Windows x86. What sort of work would this entail?
2. Batch cross-compiled builds for all OSs/archs on a continuous integration service (e.g. Travis CI) or cloud service, then package up the binaries with the test suite.
3. Send the package to our buildbots, and run the test suite.
4. (optional) If using a CI service, have the buildbots send results back to the CI. This could be useful if we'd use GitHub for pulls in the future *.
Cheers, Will
* I realize vanilla GitHub currently has certain annoying limitations, though some of them are pretty easy to solve via the github-services and/or webhooks. I don’t think this conflicts with the desire to use Phabricator, either, so I’ll send details and motivations to that thread.
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJToi1rAAoJEP0rIXJNjNSA7EkIAL2FFR8aBRsxHBTXIcCx6QsM HE9EHpO9zVF0hZYoTTw9+SwyI08NCMUvRg65YD2Wwrgq+yvGurX/+Oat7UI+6ZJY jWRY6LJpTDX9OcIFs3wCv7FmSbMDDLgdNR+2t1/atw/buVBityoYKi+1rqeU4I0y l5mCxL1hXIKwpOVU0IQ1NlZ/Q0G9er5qFSkbQFlRwS2rYNArvmp8UlTxsClZBw07 uSt5Mq2sKuUAth3ZCAt+8Hqp+kWDmV8UPDfDbP/tKSx83XOmH0SDwYCtVj7WwT+V psHkQwKPOg9QBto2DkxNVXLvwedV3awDhS88emtxQeulCZqly9FP5SWuHjRFHsU= =Ldqt -----END PGP SIGNATURE-----