I've been spending some time on the Windows build, and a bunch of things came up.
1. Building on msys2 with the provided ghc-tarballs works. The
wikipage for MSYS2 should be in good shape. It would be nice to get a bit more testing and then consider replacing the
default Windows build page. The new instructions are significantly simpler and basically a few steps of copy & paste after installing msys2.
2. Since the msys2 setup instructions are so simple and linear, perhaps it would be even better to put them in a shellscript and check that in? Then the wikipage would turn into a one-liner.
3. Why is ghc-tarballs a git repository? That does not seem very wise. For now, it does not hurt much, but whenever the tarballs are updated the repository will grow fatter and fatter, forcing people to download all versions of tarballs that were ever checked in. It's not really a problem now (only one version of mingw has ever been checked in so far, for example), but will become annoying. Can we move from the broken model? Could we have a stable folder under
haskell.org/ to put the files in, to make sure that they never go away, and just wget/curl them from there?
4. Looks like the rubenvb archive could be updated, there's a
new version (gcc 4.8.0) available. That should be a step up from 4.6.3 and should address some complaints about the aged toolchain that I saw on the issue tracker (and
this one too) I tried out building with the new tarball, and it seems to work, although I did not look too closely. (Some tests failed, but I have not checked if they also failed with the previous version. That takes so long...)
5. Why are we using the rubenvb build of mingw? I don't think I've seen any comments about that in the repository. Is there anything special about it? A short README with a note next to the tarball would be great.
6. There is an asymmetry between the embedded 64-bit distribution mingw and the 32-bit distribution. The
former is a single rubenvb tarball, while the
latter is a bunch of files. There's a
monolithic rubenvb build available for 32-bit too. Should we use that? It would simplify the build scripts and make the builds more consistent.
7. ghc-tarballs includes perl. Ugh. I don't think there's much perl code remaining in the repository, shall we just clean up that mess and drop the dependency? Or at least just assume that the host system provides it (which msys2 does).
8. A broader question: what general approach to ghc on Windows shall we take? The prebuilt packages currently provided by ghc-tarballs are also covered by msys2's package manager. Why not offload that pain to msys2 then? The advantage here is less maintenance (and automatic upgrades of the toolchain), the disadvantage is that the distribution becomes less stable and msys2 updates could break ghc builds more easily. I think it would make sense to be consistent with the Linux builds; we don't bundle compilers with those. In that sense msys2 would be like another distribution. Of course, we need to also consider if msys2 can be trusted to stick around and stay up to date in the long run. It looks like a relatively new project, so there's some risk.
9. If I understand correctly, one other thing to consider before dropping ghc-tarballs is that Windows ghc still needs GCC utilities (like cpp) to function properly, and so we need to have a prepackaged bundle of binary GCC utilities (and maybe hardcoded paths? not sure) to make that work. On the other hand, a custom-built ghc should work just fine in the msys2 environment which does provide cpp et al., and the additional GCC bundles would perhaps best be owned by, for example, the Haskell Platform project rather than be part of core ghc?
10. Following the idea in (8), I tried to build ghc using the mingw gcc provided by msys2 instead of the one in ghc-tarballs. It was a bit weird. I had to hack
configure.ac to disable use of ghc-tarballs and try to use system tools. How about a configure option to enable/disable use of ghc-tarballs? I also ran into some weird issues, for example, the system ld and nm would not get detected by the configure script correctly. They were found when I explicitly set LD=ld and NM=nm. Weird. Will look into that later. Other than that, there were no major problems, except...
11. A build with the host gcc failed. I think the cause is that it is too new (4.9.1, significantly newer than 4.6.3 in ghc-tarballs). The build of the currently checked in GMP (libraries/integer-gmp) fails because a utility used in the build process segfaults. I tried upgrading gmp from 5.0.3 to 6.0.0, and 6.0.0 builds fine by itself but the ghc-specific patch used for 5.0.3 no longer applies (is it still necessary?). Oh brother. One of the advantages of tracking msys2's gcc would be that we would notice such breakage earlier. Shall I open an issue?
12. Another advantage of switching to msys2's gcc is that it I think it would allow ghc to drop some
hacks.
13. Side note: early on because of my own confusion I wasted a bunch of time trying to get ghc to build with /usr/bin/ghc in msys2, which is the cygwin gcc that provides additional POSIX compatibility layers. That could work in theory, but I ran into numerous issues with msys name mangling (/usr/bin/gcc gets really confused when run as c:/msys64/usr/bin/gcc, which breaks many ghc subproject builds). Might be a good idea to put in a guard in the configure script to warn if a cygwin gcc is detected (or add explicit support for it). Actually, looks like there's already a related issue open, although I'm not quite sure what the scope is there (
#8842, thanks Thomas).
14. The test runner assumes native Windows Python, but it's only a few small changes away from working fine on the python2 provided by msys2, which would cut another external build dependency. Could someone review and merge my patches (
#9604,
#9626)? Thanks.