Re: Continuous Integration and Cross Compilation

19 Jun 2014

      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Great and detailed response Austin.  Thank you.

William, I'm happy to help in any way I can.

I run SmartOS x86 and x86_64 builds of GHC HEAD on my own equipment
using the GHC Builder Ian Lynagh developed:
https://ghc.haskell.org/trac/ghc/wiki/Builder
https://github.com/haskell/ghc-builder

I'm also currently working on small tweaks to the ghc-builder and
getting the GHC testsuite to pass on Illumos (and indirectly Solaris).

I follow Gábor's lead on the GHC Builder priorities and Carter
Schonwald acts a Pull Request gatekeeper for changes.

Best,
Alain

On 06/18/2014 11:53 PM, Austin Seipp wrote:
...
Hi William,
Thanks for the email. Here're some things to consider.
For one, cross compilation is a hot topic, but it is going to be a 
rather large amount of work to fix and it won't be easy. The
primary problem is that we need to make Template Haskell
cross-compile, but in general this is nontrivial: TemplateHaskell
must load and run object code on the *host* platform, but the
compiler must generate code for the *target* platform. There are
ways around some of these problems; for one, we could compile every
module twice, once for the host, and once for the target. Upon
requesting TH, the Host GHC would load Host Object Code, but the
final executable would link with the Target Object Code.
There are many, many subtle points to consider if we go down this 
route - what happens for example if I cross compile from a 64bit 
machine to a 32bit one, but TemplateHaskell wants some knowledge
like what "sizeOf (undefined :: CLong)" is? The host code sees a
64-bit quantity while the target actually will deal with a 32bit
one. This could later explode horribly. And this isn't limited to
different endianness either - it applies to the ABI in general.
64bit Linux -> 64bit Windows would be just as problematic with this
exact case, as one uses LP64, while the other uses LLP64 data
models.
So #1 by itself is a very, very non-trivial amount of work, and IMO
I don't think it's necessary for better builds. There are other
routes possible for cross compilation perhaps, but I'd speculate
they are all equally as non-trivial as this one.
Finally, the remainder of the scheme, including shipping builds to 
remote machines and have them be tested sounds a bit more
complicated, and I'm wondering what the advantages are. In
particular it seems like this merely exposes more opportunities for
failure points in the CI system, because now all CI depends on
cross compilation working properly, being able to ship reports back
and forth, and more. Depending on CC in particular is a huge burden
it sounds: it makes it hard to distinguish when a cross-compilation
bug may cause a failure as opposed to a changeset from a committer,
which widens the scope of what we need to consider. A CI system
should be absolutely as predictable as possible, and this adds a
*lot* of variables to the mix. Cross compilation is really
something that's not just one big task - there will be many *small*
bugs laying in wait after that, the pain of a thousand cuts.
Really, we need to distinguish between two needs:
1) Continuous integration.
2) Nightly builds.
These two systems have very different needs in practice:
1) A CI system needs to be *fast*, and it needs to have dedicated 
resources to respond to changes quickly. This means we need to 
*minimize* the amount of time for developer turn around to see 
results. That includes minimizing the needed configurations.
Shipping builds to remote machines just for CI would greatly
complicate this and likely make it far longer on its own, not to
mention it increases with every system we add.
2) A nightly build system is under nowhere near the same time 
constraints, although it also needs to be dedicated. If an
ARM/Linux machine takes 6 hours to build (perhaps it's shared or
something, or just really wimpy), that's totally acceptable. These
can then report nightly about the results and we can reasonably
blame people/changesets based on that.
Finally, both of these become more complicated by the fact GHC is
a large project that has a highly variable number of configurations
we have to keep under control: static, dynamic, static+dynamic, 
profiling, LLVM builds, builds where GHC itself is profiled, as
well as the matrix of those combinations: LLVM+GHC Profiled, etc
etc etc. Each of these configurations expose bugs in their own
right. Unfortunately doing #1 with all these configurations would
be ludicrous: it would explode the build times for any given
system, and it also drastically multiplies the hardware resources
we'd need for CI if we wanted them to respond quickly to any given
changeset, because you not only have to *build* them, you must run
them. And now you have to run a lot of them. A nightly build system
is more reasonable for these problems, because taking hours and
hours is expected. These problems would still be true even with
cross compilation, because it multiplies the amount of work every
CI run must do no matter what.
We actually already do have both of these already, too: Joachim 
Breitner for example has set us up a Travis-CI[1] setup, while
Gabor Pali has set us up nightly builds[2]. Travis-CI does the job
of fast CI, but it's not good for a few reasons:
1) We have literally zero visibility into it for reports.
Essentially we only know when it explodes because Joachim yells at
us (normally at me :) This is because GitHub is not our
center-of-the-universe, despite how much people yearn for it to be
so.
2) The time limit is unacceptable. Travis-CI for example actually 
cannot do dynamic builds of GHC because it takes too long.
Considering GHC is shipping dynamically on major platforms now,
that's quite a huge loss for a CI system to miss (and no, a
separate build matrix configuration doesn't work here - GHC builds
statically and dynamically at the same time, and ships both -
there's no way to have "only static" and "only dynamic" entries.)
3) It has limited platform support - only recently did it have OS
X, and Windows is not yet in sight. Ditto for FreeBSD. These are
crucial for CI as well, as they encompass all our Tier-1 platforms.
This could be fixed with cross compilation, but again, that's a
big, big project.
And finally, on the GitHub note, as I said in the prior thread
about Phabricator, I don't actually think it offers us anything
useful at this point in time - literally almost nothing other than
"other projects use GitHub", which is not an advantage, it's an
appeal to popularity IMO. Webhooks still cannot do things like ban
tabs, trailing whitespace, or enforce submodule integrity. We have
to have our own setup for all of that. I'm never going to hit the
'Merge Button' for PRs - validation is 100% mandatory on behalf of
the merger, and again, Travis-CI cannot provide coherent coverage
even if we could use it for that. And because of that there's no
difference between GitHub any other code site - I have to pull the
branch manually and test myself, which I could do with any random
git repository in the world.
The code review tools are worse than Phabricator. Finally, if we
are going to accept patches from people, we need to have a
coherent, singular way to do it - mixing GitHub PRs, Phabricator,
and uploading patches to Trac is just a nightmare for pain, and not
just for me, even though I do most of the patch work - it incurs
the burden on *every* person who wants to review code to now do so
in many separate places. And we need to make code review *easier*,
not harder! If anything, we should be consolidating on a single
place (obviously, I'd vote for Phabricator), not adding more places
to make changes that we all have to keep up with, when we don't
even use the service itself! That's why I proposed Phabricator:
because it is coherent and a singular place to go to, and very good
at what it does, and does not attempt to 'take over' GHC itself.
GitHub is a fairly all-or-nothing proposition if you want any
benefits it delivers, if you ask me (I say this as someone who
likes GitHub for smaller projects). I just don't think their tools
are suitable for us.
So, back to the topic. I think the nightly builds are actually in
an OK state at the moment, since we do get reports from them, and 
builders do check in regularly. The nightly builders also cover a
more diverse set of platforms than our CI will. But the CI and
turnaround could be *greatly* improved, I think, because
ghc-complete is essentially ignored or unknown by many people.
So I'll also make a suggestion: just to actually get something
that will pull GHC's repo every 10 minutes or so, do a build, and
then email ghc-devs *only* if failures pop up. In fact, we could
just re-use the existing nightly build infrastructure for this, and
just make it check very regularly, and just run standard
amd64/Linux and Windows builds upon changes. I could provide
hardware for this. This would increase the visibility of reports,
not require *any* new code, and already works.
Overall, I will absolutely help you in every possible way, because 
this really is a problem for newcomers, and existing developers,
when we catch dumb failures later than we should. But I think the
proposed solution here is extraordinarily complex in comparison to
what we actually need right now.
... I will say that if you *did* fix cross compilation however to
work with TH you would be a hero to many people - myself included
- continuous integration aside! :)
[1] https://github.com/nomeata/ghc-complete [2]
http://haskell.inf.elte.hu/builders/
On Wed, Jun 18, 2014 at 3:10 PM, William Knop 
 wrote:
...
Hello all,
I’ve seen quite a few comments on the list and elsewhere
lamenting the time it takes to compile and validate ghc. It’s
troublesome not only because it’s inconvenient, but, more
seriously, people are holding off on sending patches in which
stifles development. I would like to propose a solution:
1. Implement proper cross-compilation, such that build and host
may be different— e.g. a linux x86_64 machine can build ghc that
runs on Windows x86. What sort of work would this entail?
2. Batch cross-compiled builds for all OSs/archs on a continuous
integration service (e.g. Travis CI) or cloud service, then
package up the binaries with the test suite.
3. Send the package to our buildbots, and run the test suite.
4. (optional) If using a CI service, have the buildbots send
results back to the CI. This could be useful if we'd use GitHub
for pulls in the future *.
Cheers, Will
* I realize vanilla GitHub currently has certain annoying
limitations, though some of them are pretty easy to solve via the
github-services and/or webhooks. I don’t think this conflicts
with the desire to use Phabricator, either, so I’ll send details
and motivations to that thread.
_______________________________________________ ghc-devs mailing
list ghc-devs@haskell.org 
http://www.haskell.org/mailman/listinfo/ghc-devs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJToi1rAAoJEP0rIXJNjNSA7EkIAL2FFR8aBRsxHBTXIcCx6QsM
HE9EHpO9zVF0hZYoTTw9+SwyI08NCMUvRg65YD2Wwrgq+yvGurX/+Oat7UI+6ZJY
jWRY6LJpTDX9OcIFs3wCv7FmSbMDDLgdNR+2t1/atw/buVBityoYKi+1rqeU4I0y
l5mCxL1hXIKwpOVU0IQ1NlZ/Q0G9er5qFSkbQFlRwS2rYNArvmp8UlTxsClZBw07
uSt5Mq2sKuUAth3ZCAt+8Hqp+kWDmV8UPDfDbP/tKSx83XOmH0SDwYCtVj7WwT+V
psHkQwKPOg9QBto2DkxNVXLvwedV3awDhS88emtxQeulCZqly9FP5SWuHjRFHsU=
=Ldqt
-----END PGP SIGNATURE-----

Re: Continuous Integration and Cross Compilation

Alain O'Dea