Thanks for the detailed explanations. A few thoughts here:

Having multiple "configurations" of the source tree (in that some parts of it may be missing or not) does not sound like a good idea, it just seems like additional complexity for no particular reason. AFAIU, it means that if I check out additional libraries into my repository (or build those libraries somehow?), tests for other packages might start failing, which is weird.

It sounds like the current decision of keeping "random" and a few other packages not built is rather ad hoc. Is that the case?

Ideally there would be one ghc testsuite that would always include all tests (when a faster test run is desired, a more generic mechanism of test filtering should be used, some test suites react to a "fast" flag, right?). If there are tests that we do not want to run as part of the global test suite, it seems that they should live together with the library implementation then and be maintained there, separately from ghc.

What's the compilation cost of the additional libraries relative to the complete build? (If you don't know off the bat, how do I get them built to measure the overhead?) Is it really significant? If it is, can we split off related tests? If it isn't, let's just enable them by default.

On Thu, Oct 30, 2014 at 10:19 PM, Austin Seipp <austin@well-typed.com> wrote:

On Thu, Oct 30, 2014 at 6:48 AM, Gintautas Miliauskas
<gintautas.miliauskas@gmail.com> wrote:
> Going through some validate.sh results, I found compilation errors due to
> missing libraries, like this one:
>
> =====> stm052(normal) 4088 of 4108 [0, 21, 0]
> cd ../../libraries/stm/tests &&
> 'C:/msys64/home/Gintas/ghc/bindisttest/install dir/bin/ghc.exe'
> -fforce-recomp -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-db
> -rtsopt
> s -fno-warn-tabs -fno-ghci-history -o stm052 stm052.hs -package stm
>>stm052.comp.stderr 2>&1
> Compile failed (status 256) errors were:
>
> stm052.hs:10:8:
> Could not find module ‘System.Random’
> Use -v to see a list of the files searched for.
>
> I was surprised to see that these are not listed in the test summary at the
> end of the test run, but only counted towards the "X had missing libraries"
> row. That setup makes it really easy to miss them, and I can't think of a
> good reason to sweep such tests under the rug; a broken test is a failing
> test.

Actually, these tests aren't broken in the way you think :) It's a bit
long-winded to explain...

Basically, GHC can, if you let it, build extra dependencies in its
build process, one of which is the 'random' library. 'random' was not
ever a true requirement to build GHC (aka a 'bootlib' as we call
them). So why is this test here?

Because 'random' was actually a dependency of the Data Parallel
Haskell package, and until not too long ago (earlier this year),
`./validate` built and compiled DPH - with all its dependencies;
random, vector, primitive - by default. This actually adds a pretty
noticeable time to the build (you are compiling 5-8 more libraries
after all), and at the time, DPH was also not ready for the
Applicative-Monad patch. So we turned it off, as well as the
dependencies.

Additionally, GHC does have some 'extra' libraries which you can
optionally build during the build process, but which are turned off by
default. Originally this was because the weirdo './sync-all' script
used to not need everything, and 'stm' was a library that wasn't
cloned by default.

Now that we've submoduleified everything though, these tests and the
extra libraries could be built by default. Which we could certainly
do.

> How about at least listing such failed tests in the list of failed tests of
> the end?

I'd probably be OK with this.

> At least in this case the error does not seem to be due to some missing
> external dependencies (which probably would not be a great idea anyway...).
> The test does pass if I remove the "-no-user-package-db" argument. What was
> the intention here? Does packaging work somehow differently on Linux? (I'm
> currently testing on Windows.)

I'm just guessing but, I imagine you really don't want to remove
'-no-user-package-db' at all, for any platform, otherwise Weird Things
Might Happen, I'd assume.

The TL;DR here is that when you build a copy of GHC and all the
libraries, it actually *does* register the built packages for the
compiler... this always happens, *even if you do not install it*. The
primary 'global' package DB just sits in tree instead, under
./inplace.

When you run ./validate, what happens is that after the build, we
actually create a binary distribution and then test *that* compiler
instead, as you can see (obviously for a good reason - broken bindists
would be bad). The binary distribution obviously has its own set of
binary packages it came with; those are the packages you built into it
after all. The reason we tell GHC to ignore the user package db here
is precisely because we *do not* want to pick it up! We only want to
test the binary distribution with the packages *it* has.

Now you might say, well, Austin, the version numbers are different!
How would it pick that up? Not always... What if I built a copy of GHC
HEAD today, then built something with it using Cabal? Then that will
install into my user package database. Now I go back to my GHC tree
and hack away _on the same day_ and run './validate'... the version
number hasn't changed *at all* because it's date based, meaning the
binary distribution could certainly pick up the previously installed
libraries, which I installed via the older compiler. But I don't want
that! I only want to run those tests with the compiler I'm validating
*now*.

I imagine the reason you see this test pass if you remove this
argument is precisely for this reason: it doesn't fail because it's
picking up a package database in your existing environment. But that's
really, really not what you want (I'd be surprised if it worked and
didn't result in some horrible error or crash).

> On a related note, how about separating test failures from failing
> performance tests ("stat too good" / "stat not good enough")? The latter are
> important, but they seem to be much more prone to fail without good reason.
> Perhaps do some color coding of the test runner output? That would also
> help.

I also think this is a good idea.

> --
> Gintautas Miliauskas
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>

--
Regards,

Austin Seipp, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com/

Gintautas Miliauskas